1
|
Kang CK, Kim AR. Deep molecular learning of transcriptional control of a synthetic CRE enhancer and its variants. iScience 2024; 27:108747. [PMID: 38222110 PMCID: PMC10784702 DOI: 10.1016/j.isci.2023.108747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 08/29/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
Massively parallel reporter assay measures transcriptional activities of various cis-regulatory modules (CRMs) in a single experiment. We developed a thermodynamic computational model framework that calculates quantitative levels of gene expression directly from regulatory DNA sequences. Using the framework, we investigated the molecular mechanisms of cis-regulatory mutations of a synthetic enhancer that cause abnormal gene expression. We found that, in a human cell line, competitive binding between family transcription factors (TFs) with slightly different binding preferences significantly increases the accuracy of recapitulating the transcriptional effects of thousands of single- or multi-mutations. We also discovered that even if various harmful mutations occurred in an activator binding site, CRM could stably maintain or even increase gene expression through a certain form of competitive binding between family TFs. These findings enhance understanding the effect of SNPs and indels on CRMs and would help building robust custom-designed CRMs for biologics production and gene therapy.
Collapse
Affiliation(s)
- Chan-Koo Kang
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| | - Ah-Ram Kim
- School of Life Science, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Department of Advanced Convergence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- School of Applied Artificial Intelligence, Handong Global University, Pohang, Gyeong-Buk 37554, South Korea
| |
Collapse
|
2
|
Ren N, Dai S, Ma S, Yang F. Strategies for activity analysis of single nucleotide polymorphisms associated with human diseases. Clin Genet 2023; 103:392-400. [PMID: 36527336 DOI: 10.1111/cge.14282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/10/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Genome-wide association studies (GWAS) have identified a large number of single nucleotide polymorphism (SNP) sites associated with human diseases. In the annotation of human diseases, especially cancers, SNPs, as an important component of genetic factors, have gained increasing attention. Given that most of the SNPs are located in non-coding regions, the functional verification of these SNPs is a great challenge. The key to functional annotation for risk SNPs is to screen SNPs with regulatory activity from thousands of disease associated-SNPs. In this review, we systematically recapitulate the characteristics and functional roles of SNP sites, discuss three parallel reporter screening strategies in detail based on barcode tag classification, and recommend the common in silico strategies to help supplement the annotation of SNP sites with epigenetic activity analysis, prediction of target genes and trans-acting factors. We hope that this review will contribute to this exuberant research field by providing robust activity analysis strategies that can facilitate the translation of GWAS results into personalized diagnosis and prevention measures for human diseases.
Collapse
Affiliation(s)
- Naixia Ren
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shangkun Dai
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| | - Shumin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao, China
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo, China
| |
Collapse
|
3
|
Dong X, Zheng W. Cheminformatics Modeling of Gene Silencing for Both Natural and Chemically Modified siRNAs. Molecules 2022; 27:6412. [PMID: 36234948 PMCID: PMC9570765 DOI: 10.3390/molecules27196412] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/23/2022] [Accepted: 09/25/2022] [Indexed: 11/17/2022] Open
Abstract
In designing effective siRNAs for a specific mRNA target, it is critically important to have predictive models for the potency of siRNAs. None of the published methods characterized the chemical structures of individual nucleotides constituting a siRNA molecule; therefore, they cannot predict the potency of gene silencing by chemically modified siRNAs (cm-siRNA). We propose a new approach that can predict the potency of gene silencing by cm-siRNAs, which characterizes each nucleotide (NT) using 12 BCUT cheminformatics descriptors describing its charge distribution, hydrophobic and polar properties. Thus, a 21-NT siRNA molecule is described by 252 descriptors resulting from concatenating all the BCUT values of its composing nucleotides. Partial Least Square is employed to develop statistical models. The Huesken data (2431 natural siRNA molecules) were used to perform model building and evaluation for natural siRNAs. Our results were comparable with or superior to those from Huesken's algorithm. The Bramsen dataset (48 cm-siRNAs) was used to build and test the models for cm-siRNAs. The predictive r2 of the resulting models reached 0.65 (or Pearson r values of 0.82). Thus, this new method can be used to successfully model gene silencing potency by both natural and chemically modified siRNA molecules.
Collapse
Affiliation(s)
| | - Weifan Zheng
- BRITE Institute and Department of Pharmaceutical Sciences, College of Health and Sciences (CHAS), North Carolina Central University, Durham, NC 27707, USA
| |
Collapse
|
4
|
Van Brempt M, Peeters AI, Duchi D, De Wannemaeker L, Maertens J, De Paepe B, De Mey M. Biosensor-driven, model-based optimization of the orthogonally expressed naringenin biosynthesis pathway. Microb Cell Fact 2022; 21:49. [PMID: 35346204 PMCID: PMC8962593 DOI: 10.1186/s12934-022-01775-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/15/2022] [Indexed: 12/30/2022] Open
Abstract
Background The rapidly expanding synthetic biology toolbox allows engineers to develop smarter strategies to tackle the optimization of complex biosynthetic pathways. In such a strategy, multi-gene pathways are subdivided in several modules which are each dynamically controlled to fine-tune their expression in response to a changing cellular environment. To fine-tune separate modules without interference between modules or from the host regulatory machinery, a sigma factor (σ) toolbox was developed in previous work for tunable orthogonal gene expression. Here, this toolbox is implemented in E. coli to orthogonally express and fine-tune a pathway for the heterologous biosynthesis of the industrially relevant plant metabolite, naringenin. To optimize the production of this pathway, a practical workflow is still imperative to balance all steps of the pathway. This is tackled here by the biosensor-driven screening, subsequent genotyping of combinatorially engineered libraries and finally the training of three different computer models to predict the optimal pathway configuration. Results The efficiency and knowledge gained through this workflow is demonstrated here by improving the naringenin production titer by 32% with respect to a random pathway library screen. Our best strain was cultured in a batch bioreactor experiment and was able to produce 286 mg/L naringenin from glycerol in approximately 26 h. This is the highest reported naringenin production titer in E. coli without the supplementation of pathway precursors to the medium or any precursor pathway engineering. In addition, valuable pathway configuration preferences were identified in the statistical learning process, such as specific enzyme variant preferences and significant correlations between promoter strength at specific steps in the pathway and titer. Conclusions An efficient strategy, powered by orthogonal expression, was applied to successfully optimize a biosynthetic pathway for microbial production of flavonoids in E. coli up to high, competitive levels. Within this strategy, statistical learning techniques were combined with combinatorial pathway optimization techniques and an in vivo high-throughput screening method to efficiently determine the optimal operon configuration of the pathway. This “pathway architecture designer” workflow can be applied for the fast and efficient development of new microbial cell factories for different types of molecules of interest while also providing additional insights into the underlying pathway characteristics. Supplementary Information The online version contains supplementary material available at 10.1186/s12934-022-01775-8.
Collapse
Affiliation(s)
- Maarten Van Brempt
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium
| | - Andries Ivo Peeters
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium
| | - Dries Duchi
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium
| | - Lien De Wannemaeker
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium
| | - Jo Maertens
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium
| | - Brecht De Paepe
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium
| | - Marjan De Mey
- Centre For Synthetic Biology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium.
| |
Collapse
|
5
|
Zhou P, Liu Q, Wu T, Miao Q, Shang S, Wang H, Chen Z, Wang S, Wang H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J Chem Inf Model 2021; 61:1718-1731. [DOI: 10.1021/acs.jcim.0c01370] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qian Liu
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Ting Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shuyong Shang
- College of Chemistry and Life Science, Chengdu Normal University, Chengdu 611130, China
| | - Heyi Wang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Zheng Chen
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shaozhou Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Heyan Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| |
Collapse
|
6
|
Gilman J, Zulkower V, Menolascina F. Using a Design of Experiments Approach to Inform the Design of Hybrid Synthetic Yeast Promoters. Methods Mol Biol 2021; 2189:1-17. [PMID: 33180289 DOI: 10.1007/978-1-0716-0822-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hybrid promoter engineering takes advantage of the modular nature of eukaryotic promoters by combining discrete promoter motifs to confer novel regulatory function. By combinatorially screening sequence libraries for trans-acting transcriptional operators, activators, repressors and core promoter sequences, it is possible to derive constitutive or inducible promoter collections covering a broad range of expression strengths. However, combinatorial approaches to promoter design can result in highly complex, multidimensional design spaces, which can be experimentally costly to thoroughly explore in vivo. Here, we describe an in silico pipeline for the design of hybrid promoter libraries that employs a Design of Experiments (DoE) approach to reduce experimental burden and efficiently explore the promoter fitness landscape. We also describe a software pipeline to ensure that the designed promoter sequences are compatible with the YTK assembly standard.
Collapse
Affiliation(s)
- James Gilman
- Institute for Bioengineering, School of Engineering, University of Edinburgh, Edinburgh, UK
| | - Valentin Zulkower
- Edinburgh Genome Foundry, The University of Edinburgh, Edinburgh, UK
| | - Filippo Menolascina
- Institute for Bioengineering, School of Engineering, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
7
|
Van Brempt M, Clauwaert J, Mey F, Stock M, Maertens J, Waegeman W, De Mey M. Predictive design of sigma factor-specific promoters. Nat Commun 2020; 11:5822. [PMID: 33199691 PMCID: PMC7670410 DOI: 10.1038/s41467-020-19446-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/13/2020] [Indexed: 02/07/2023] Open
Abstract
To engineer synthetic gene circuits, molecular building blocks are developed which can modulate gene expression without interference, mutually or with the host's cell machinery. As the complexity of gene circuits increases, automated design tools and tailored building blocks to ensure perfect tuning of all components in the network are required. Despite the efforts to develop prediction tools that allow forward engineering of promoter transcription initiation frequency (TIF), such a tool is still lacking. Here, we use promoter libraries of E. coli sigma factor 70 (σ70)- and B. subtilis σB-, σF- and σW-dependent promoters to construct prediction models, capable of both predicting promoter TIF and orthogonality of the σ-specific promoters. This is achieved by training a convolutional neural network with high-throughput DNA sequencing data from fluorescence-activated cell sorted promoter libraries. This model functions as the base of the online promoter design tool (ProD), providing tailored promoters for tailored genetic systems.
Collapse
Affiliation(s)
- Maarten Van Brempt
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Jim Clauwaert
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000, Ghent, Belgium
| | - Friederike Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000, Ghent, Belgium
| | - Jo Maertens
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000, Ghent, Belgium
| | - Marjan De Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium.
| |
Collapse
|
8
|
Ferreira A, Lapa R, Vale N. Combination of Gemcitabine with Cell-Penetrating Peptides: A Pharmacokinetic Approach Using In Silico Tools. Biomolecules 2019; 9:biom9110693. [PMID: 31690028 PMCID: PMC6921036 DOI: 10.3390/biom9110693] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/07/2019] [Accepted: 11/01/2019] [Indexed: 02/06/2023] Open
Abstract
Gemcitabine is an anticancer drug used to treat a wide range of solid tumors and is a first line treatment for pancreatic cancer. Our group has previously developed novel conjugates of gemcitabine with cell-penetrating peptides (CPP), and here we report some preliminary data regarding the pharmacokinetics of gemcitabine, two gemcitabine-CPP conjugates and respective CPP gathered from GastroPlus™, and analyze these results considering our previous evaluation of gemcitabine release and conjugates’ bioactivity. Additionally, seeking to shed some light on the relation between the penetration ability of CPP and their physicochemical properties, chemical descriptors for the 20 natural amino acids were calculated, a new principal property scale (z-scale) was created and CPP prediction models were developed, establishing quantitative structure-activity relationships (QSAR). The z-scores of the peptides conjugated with gemcitabine are presented and analyzed with the aforementioned data.
Collapse
Affiliation(s)
- Abigail Ferreira
- Laboratory of Pharmacology, Department of Drug Sciences, Faculty of Pharmacy, University of Porto, Rua de Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal.
- LAQV/REQUIMTE, Laboratory of Applied Chemistry, Department of Chemical Sciences, Faculty of Pharmacy, University of Porto, Rua de Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal.
| | - Rui Lapa
- LAQV/REQUIMTE, Laboratory of Applied Chemistry, Department of Chemical Sciences, Faculty of Pharmacy, University of Porto, Rua de Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal.
| | - Nuno Vale
- Laboratory of Pharmacology, Department of Drug Sciences, Faculty of Pharmacy, University of Porto, Rua de Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal.
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Rua Júlio Amaral de Carvalho, 45, 4200-135 Porto, Portugal.
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Rua Alfredo Allen, 208, 4200-135 Porto, Portugal.
- Department of Molecular Pathology and Immunology, Abel Salazar Biomedical Sciences Institute (ICBAS), University of Porto, Rua de Jorge Viterbo Ferreira, 228, 4050-313 Porto, Portugal.
| |
Collapse
|
9
|
Quantitative sequence-activity modeling of ACE peptide originated from milk using ACC-QTMS amino acid indices. Amino Acids 2019; 51:1209-1220. [PMID: 31321559 DOI: 10.1007/s00726-019-02761-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 07/05/2019] [Indexed: 01/06/2023]
Abstract
Up to now, numerous peptides/hydrolysates derived from casein and whey protein have shown angiotensin-I-converting enzyme (ACE) inhibitory. In this research, quantum topological molecular similarity (QTMS) indices of amino acids were utilized in quantitative sequence-activity modeling (QSAM) to predict the activity of a set of milk-driven peptides with ACE inhibition. Since the derived peptides have not the same number of residues, we overcame this issue by auto cross covariance (ACC) methodology. Then, some QSAMs were built to predict the pIC50 value of ACE peptides derived from Bovine Casein and Whey. The model established an acceptable relationship between the selected variables and the pIC50 of the peptides. To estimate the performance of the developed models, casein and whey proteins from human, goat, bovine and sheep were virtually broken by trypsin and chymotrypsin enzymes and the ACE activity of the resultant virtual peptides were predicted and some new ACE peptides were proposed.
Collapse
|
10
|
Gilman J, Singleton C, Tennant RK, James P, Howard TP, Lux T, Parker DA, Love J. Rapid, Heuristic Discovery and Design of Promoter Collections in Non-Model Microbes for Industrial Applications. ACS Synth Biol 2019; 8:1175-1186. [PMID: 30995831 DOI: 10.1021/acssynbio.9b00061] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Well-characterized promoter collections for synthetic biology applications are not always available in industrially relevant hosts. We developed a broadly applicable method for promoter identification in atypical microbial hosts that requires no a priori understanding of cis-regulatory element structure. This novel approach combines bioinformatic filtering with rapid empirical characterization to expand the promoter toolkit and uses machine learning to improve the understanding of the relationship between DNA sequence and function. Here, we apply the method in Geobacillus thermoglucosidasius, a thermophilic organism with high potential as a synthetic biology chassis for industrial applications. Bioinformatic screening of G. kaustophilus, G. stearothermophilus, G. thermodenitrificans, and G. thermoglucosidasius resulted in the identification of 636 100 bp putative promoters, encompassing the genome-wide design space and lacking known transcription factor binding sites. Eighty of these sequences were characterized in vivo, and activities covered a 2-log range of predictable expression levels. Seven sequences were shown to function consistently regardless of the downstream coding sequence. Partition modeling identified sequence positions upstream of the canonical -35 and -10 consensus motifs that were predicted to strongly influence regulatory activity in Geobacillus, and artificial neural network and partial least squares regression models were derived to assess if there were a simple, forward, quantitative method for in silico prediction of promoter function. However, the models were insufficiently general to predict pre hoc promoter activity in vivo, most probably as a result of the relatively small size of the training data set compared to the size of the modeled design space.
Collapse
Affiliation(s)
- James Gilman
- The BioEconomy Centre, Biosciences, College of Life and Environmental Sciences, Stocker Road, University of Exeter, Exeter EX4 4QD, U.K
| | - Chloe Singleton
- The BioEconomy Centre, Biosciences, College of Life and Environmental Sciences, Stocker Road, University of Exeter, Exeter EX4 4QD, U.K
| | - Richard K. Tennant
- The BioEconomy Centre, Biosciences, College of Life and Environmental Sciences, Stocker Road, University of Exeter, Exeter EX4 4QD, U.K
| | - Paul James
- The BioEconomy Centre, Biosciences, College of Life and Environmental Sciences, Stocker Road, University of Exeter, Exeter EX4 4QD, U.K
| | - Thomas P. Howard
- School of Natural and Environmental Sciences, Newcastle University, Devonshire Building, Newcastle-upon-Tyne NE1 7RU, U.K
| | - Thomas Lux
- Plant Genome and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Munich 85764, Germany
| | - David A. Parker
- Biodomain, Shell Technology Center Houston, 3333 Highway 6 South, Houston, Texas 77082-3101, United States
| | - John Love
- The BioEconomy Centre, Biosciences, College of Life and Environmental Sciences, Stocker Road, University of Exeter, Exeter EX4 4QD, U.K
| |
Collapse
|
11
|
Abstract
Synthetic biology has undergone dramatic advancements for over a decade, during which it has expanded our understanding on the systems of life and opened new avenues for microbial engineering. Many biotechnological and computational methods have been developed for the construction of synthetic systems. Achievements in synthetic biology have been widely adopted in metabolic engineering, a field aimed at engineering micro-organisms to produce substances of interest. However, the engineering of metabolic systems requires dynamic redistribution of cellular resources, the creation of novel metabolic pathways, and optimal regulation of the pathways to achieve higher production titers. Thus, the design principles and tools developed in synthetic biology have been employed to create novel and flexible metabolic pathways and to optimize metabolic fluxes to increase the cells’ capability to act as production factories. In this review, we introduce synthetic biology tools and their applications to microbial cell factory constructions.
Collapse
|
12
|
Peters G, Maertens J, Lammertyn J, De Mey M. Exploring of the feature space of de novo developed post-transcriptional riboregulators. PLoS Comput Biol 2018; 14:e1006170. [PMID: 30118473 PMCID: PMC6114898 DOI: 10.1371/journal.pcbi.1006170] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 08/29/2018] [Accepted: 04/30/2018] [Indexed: 11/23/2022] Open
Abstract
Metabolic engineering increasingly depends upon RNA technology to customly rewire the metabolism to maximize production. To this end, pure riboregulators allow dynamic gene repression without the need of a potentially burdensome coexpressed protein like typical Hfq binding small RNAs and clustered regularly interspaced short palindromic repeats technology. Despite this clear advantage, no clear general design principles are available to de novo develop repressing riboregulators, limiting the availability and the reliable development of these type of riboregulators. Here, to overcome this lack of knowledge on the functionality of repressing riboregulators, translation inhibiting RNAs are developed from scratch. These de novo developed riboregulators explore features related to thermodynamical and structural factors previously attributed to translation initiation modulation. In total, 12 structural and thermodynamic features were defined of which six features were retained after removing correlations from an in silico generated riboregulator library. From this translation inhibiting RNA library, 18 riboregulators were selected using a experimental design and subsequently constructed and co-expressed with two target untranslated regions to link the translation inhibiting RNA features to functionality. The pure riboregulators in the design of experiments showed repression down to 6% of the original protein expression levels, which could only be partially explained by a ordinary least squares regression model. To allow reliable forward engineering, a partial least squares regression model was constructed and validated to link the properties of translation inhibiting RNA riboregulators to gene repression. In this model both structural and thermodynamic features were important for efficient gene repression by pure riboregulators. This approach enables a more reliable de novo forward engineering of effective pure riboregulators, which further expands the RNA toolbox for gene expression modulation. To allow reliable forward engineering of microbial cell factories, various metabolic engineering efforts rely on RNA-based technology. As such, programmable riboregulators allow dynamic control over gene expression. However, no clear design principles exist for de novo developed repressing riboregulators, which limits their applicability. Here, various engineering principles are identified and computationally explored. Subsequently, various design criteria are used in an experimental design, which were explored in an in vivo study. This resulted in a regression model that enables a more reliable computational design of repression small RNAs.
Collapse
Affiliation(s)
- Gert Peters
- Centre for Synthetic Biology, Ghent University, Ghent, Belgium
| | - Jo Maertens
- Centre for Synthetic Biology, Ghent University, Ghent, Belgium
| | | | - Marjan De Mey
- Centre for Synthetic Biology, Ghent University, Ghent, Belgium
- * E-mail:
| |
Collapse
|
13
|
Portela RMC, von Stosch M, Oliveira R. Hybrid semiparametric systems for quantitative sequence-activity modeling of synthetic biological parts. Synth Biol (Oxf) 2018; 3:ysy010. [PMID: 32995518 PMCID: PMC7513808 DOI: 10.1093/synbio/ysy010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 05/21/2018] [Accepted: 06/11/2018] [Indexed: 12/20/2022] Open
Abstract
Predicting the activity of modified biological parts is difficult due to the typically large size of nucleotide sequences, resulting in combinatorial designs that suffer from the "curse of dimensionality" problem. Mechanistic design methods are often limited by knowledge availability. Empirical methods typically require large data sets, which are difficult and/or costly to obtain. In this study, we explore for the first time the combination of both approaches within a formal hybrid semiparametric framework in an attempt to overcome the limitations of the current approaches. Protein translation as a function of the 5' untranslated region sequence in Escherichia coli is taken as case study. Thermodynamic modeling, partial least squares (PLS) and hybrid parallel combinations thereof are compared for different data sets and data partitioning scenarios. The results suggest a significant and systematic reduction of both calibration and prediction errors by the hybrid approach in comparison to standalone thermodynamic or PLS modeling. Although with different magnitudes, improvements are observed irrespective of sample size and partitioning method. All in all the results suggest an increase of predictive power by the hybrid method potentially leading to a more efficient design of biological parts.
Collapse
Affiliation(s)
- Rui M C Portela
- REQUIMTE/LAQV, Departamento de Química, Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa, Caparica, Portugal
| | - Moritz von Stosch
- CEAM Faculty of Science, Agriculture and Engineering, Newcastle University, Newcastle upon Tyne, UK
| | - Rui Oliveira
- REQUIMTE/LAQV, Departamento de Química, Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa, Caparica, Portugal
| |
Collapse
|
14
|
Barley MH, Turner NJ, Goodacre R. Improved Descriptors for the Quantitative Structure-Activity Relationship Modeling of Peptides and Proteins. J Chem Inf Model 2018; 58:234-243. [PMID: 29338232 DOI: 10.1021/acs.jcim.7b00488] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The ability to model the activity of a protein using quantitative structure-activity relationships (QSAR) requires descriptors for the 20 naturally coded amino acids. In this work we show that by modifying some established descriptors we were able to model the activity data of 140 mutants of the enzyme epoxide hydrolase with improved accuracy. These new descriptors (referred to as physical descriptors) also gave very good results when tested against a series of four dipeptide data sets. The physical descriptors encode the amino acids using only two orthogonal scales: the first is strongly linked to hydrophilicity/hydrophobicity, and the second, to the volume of the amino acid residue. The use of these new amino acid descriptors should result in simpler and more readily interpretable models for the enzyme activity (and potentially other functions of interest, e.g., secondary and tertiary structure) of peptides and proteins.
Collapse
Affiliation(s)
- Mark H Barley
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Nicholas J Turner
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Royston Goodacre
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| |
Collapse
|
15
|
Guruge I, Taherzadeh G, Zhan J, Zhou Y, Yang Y. B
-factor profile prediction for RNA flexibility using support vector machines. J Comput Chem 2017; 39:407-411. [DOI: 10.1002/jcc.25124] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 11/07/2017] [Indexed: 12/12/2022]
Affiliation(s)
- Ivantha Guruge
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Jian Zhan
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Yuedong Yang
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
- School of Data and Computer Science; Sun Yat-sen University; Guangzhou 510275 China
| |
Collapse
|
16
|
Abstract
The judicious choice of promoter to drive gene expression remains one of the most important considerations for synthetic biology applications. Constitutive promoter sequences isolated from nature are often used in laboratory settings or small-scale commercial production streams, but unconventional microbial chassis for new synthetic biology applications require well-characterized, robust and orthogonal promoters. This review provides an overview of the opportunities and challenges for synthetic promoter discovery and design, including molecular methodologies, such as saturation mutagenesis of flanking regions and mutagenesis by error-prone PCR, as well as the less familiar use of computational and statistical analyses for de novo promoter design.
Collapse
|
17
|
Moses T, Mehrshahi P, Smith AG, Goossens A. Synthetic biology approaches for the production of plant metabolites in unicellular organisms. JOURNAL OF EXPERIMENTAL BOTANY 2017; 68:4057-4074. [PMID: 28449101 DOI: 10.1093/jxb/erx119] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Synthetic biology is the repurposing of biological systems for novel objectives and applications. Through the co-ordinated and balanced expression of genes, both native and those introduced from other organisms, resources within an industrial chassis can be siphoned for the commercial production of high-value commodities. This developing interdisciplinary field has the potential to revolutionize natural product discovery from higher plants, by providing a diverse array of tools, technologies, and strategies for exploring the large chemically complex space of plant natural products using unicellular organisms. In this review, we emphasize the key features that influence the generation of biorefineries and highlight technologies and strategic solutions that can be used to overcome engineering pitfalls with rational design. Also presented is a succinct guide to assist the selection of unicellular chassis most suited for the engineering and subsequent production of the desired natural product, in order to meet the global demand for plant natural products in a safe and sustainable manner.
Collapse
Affiliation(s)
- Tessa Moses
- Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| | - Payam Mehrshahi
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| | - Alison G Smith
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| | - Alain Goossens
- Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| |
Collapse
|
18
|
Beier R, Labudde D. Numeric promoter description - A comparative view on concepts and general application. J Mol Graph Model 2015; 63:65-77. [PMID: 26655334 DOI: 10.1016/j.jmgm.2015.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 11/12/2015] [Accepted: 11/17/2015] [Indexed: 11/25/2022]
Abstract
Nucleic acid molecules play a key role in a variety of biological processes. Starting from storage and transfer tasks, this also comprises the triggering of biological processes, regulatory effects and the active influence gained by target binding. Based on the experimental output (in this case promoter sequences), further in silico analyses aid in gaining new insights into these processes and interactions. The numerical description of nucleic acids thereby constitutes a bridge between the concrete biological issues and the analytical methods. Hence, this study compares 26 descriptor sets obtained by applying well-known numerical description concepts to an established dataset of 38 DNA promoter sequences. The suitability of the description sets was evaluated by computing partial least squares regression models and assessing the model accuracy. We conclude that the major importance regarding the descriptive power is attached to positional information rather than to explicitly incorporated physico-chemical information, since a sufficient amount of implicit physico-chemical information is already encoded in the nucleobase classification. The regression models especially benefited from employing the information that is encoded in the sequential and structural neighborhood of the nucleobases. Thus, the analyses of n-grams (short fragments of length n) suggested that they are valuable descriptors for DNA target interactions. A mixed n-gram descriptor set thereby yielded the best description of the promoter sequences. The corresponding regression model was checked and found to be plausible as it was able to reproduce the characteristic binding motifs of promoter sequences in a reasonable degree. As most functional nucleic acids are based on the principle of molecular recognition, the findings are not restricted to promoter sequences, but can rather be transferred to other kinds of functional nucleic acids. Thus, the concepts presented in this study could provide advantages for future nucleic acid-based technologies, like biosensoring, therapeutics and molecular imaging.
Collapse
Affiliation(s)
- Rico Beier
- University of Applied Sciences Mittweida, Technikumplatz 17, 09648 Mittweida, Germany.
| | - Dirk Labudde
- University of Applied Sciences Mittweida, Technikumplatz 17, 09648 Mittweida, Germany.
| |
Collapse
|
19
|
Shreif Z, Striegel DA, Periwal V. The jigsaw puzzle of sequence phenotype inference: Piecing together Shannon entropy, importance sampling, and Empirical Bayes. J Theor Biol 2015; 380:399-413. [PMID: 26092377 DOI: 10.1016/j.jtbi.2015.06.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Revised: 04/29/2015] [Accepted: 06/05/2015] [Indexed: 11/24/2022]
Abstract
A nucleotide sequence 35 base pairs long can take 1,180,591,620,717,411,303,424 possible values. An example of systems biology datasets, protein binding microarrays, contain activity data from about 40,000 such sequences. The discrepancy between the number of possible configurations and the available activities is enormous. Thus, albeit that systems biology datasets are large in absolute terms, they oftentimes require methods developed for rare events due to the combinatorial increase in the number of possible configurations of biological systems. A plethora of techniques for handling large datasets, such as Empirical Bayes, or rare events, such as importance sampling, have been developed in the literature, but these cannot always be simultaneously utilized. Here we introduce a principled approach to Empirical Bayes based on importance sampling, information theory, and theoretical physics in the general context of sequence phenotype model induction. We present the analytical calculations that underlie our approach. We demonstrate the computational efficiency of the approach on concrete examples, and demonstrate its efficacy by applying the theory to publicly available protein binding microarray transcription factor datasets and to data on synthetic cAMP-regulated enhancer sequences. As further demonstrations, we find transcription factor binding motifs, predict the activity of new sequences and extract the locations of transcription factor binding sites. In summary, we present a novel method that is efficient (requiring minimal computational time and reasonable amounts of memory), has high predictive power that is comparable with that of models with hundreds of parameters, and has a limited number of optimized parameters, proportional to the sequence length.
Collapse
Affiliation(s)
- Zeina Shreif
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 12A, 12 South Drive, Bethesda, MD 20892, USA.
| | - Deborah A Striegel
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 12A, 12 South Drive, Bethesda, MD 20892, USA.
| | - Vipul Periwal
- Laboratory of Biological Modeling, National Institutes of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 12A, 12 South Drive, Bethesda, MD 20892, USA.
| |
Collapse
|
20
|
Quantitative sequence–activity modeling of antimicrobial hexapeptides using a segmented principal component strategy: an approach to describe and predict activities of peptide drugs containing l/d and unnatural residues. Amino Acids 2014; 47:125-34. [DOI: 10.1007/s00726-014-1850-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 10/03/2014] [Indexed: 12/20/2022]
|
21
|
van den Berg BA, Reinders MJ, van der Laan JM, Roubos JA, de Ridder D. Protein redesign by learning from data. Protein Eng Des Sel 2014; 27:281-8. [DOI: 10.1093/protein/gzu031] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
|
22
|
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J Cheminform 2013; 5:42. [PMID: 24059743 PMCID: PMC4015169 DOI: 10.1186/1758-2946-5-42] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.
Collapse
|
23
|
van Westen GJ, Swier RF, Wegner JK, Ijzerman AP, van Vlijmen HW, Bender A. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. J Cheminform 2013; 5:41. [PMID: 24059694 PMCID: PMC3848949 DOI: 10.1186/1758-2946-5-41] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). Results In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. Conclusion In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.
Collapse
Affiliation(s)
- Gerard Jp van Westen
- Division of Medicinal Chemistry, Leiden / Amsterdam Center for Drug Research, Einsteinweg 55, Leiden 2333, CC, The Netherlands.
| | | | | | | | | | | |
Collapse
|
24
|
Building better drugs: developing and regulating engineered therapeutic proteins. Trends Pharmacol Sci 2013; 34:534-48. [PMID: 24060103 DOI: 10.1016/j.tips.2013.08.005] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Revised: 08/08/2013] [Accepted: 08/13/2013] [Indexed: 11/22/2022]
Abstract
Most native proteins do not make optimal drugs and thus a second- and third-generation of therapeutic proteins, which have been engineered to improve product attributes or to enhance process characteristics, are rapidly becoming the norm. There has been unprecedented progress, during the past decade, in the development of platform technologies that further these ends. Although the advantages of engineered therapeutic proteins are considerable, the alterations can affect the safety and efficacy of the drugs. We discuss both the key technological innovations with respect to engineered therapeutic proteins and advancements in the underlying basic science. The latter would permit the design of science-based criteria for the prediction and assessment of potential risks and the development of appropriate risk management plans. This in turn holds promise for more predictable criteria for the licensure of a class of products that are extremely challenging to develop but represent an increasingly important component of modern medical practice.
Collapse
|
25
|
Quantitative estimation of activity and quality for collections of functional genetic elements. Nat Methods 2013; 10:347-53. [DOI: 10.1038/nmeth.2403] [Citation(s) in RCA: 161] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 02/13/2013] [Indexed: 01/20/2023]
|
26
|
Rationally designed families of orthogonal RNA regulators of translation. Nat Chem Biol 2012; 8:447-54. [DOI: 10.1038/nchembio.919] [Citation(s) in RCA: 144] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Accepted: 01/27/2012] [Indexed: 12/19/2022]
|
27
|
Gustafsson C, Minshull J, Govindarajan S, Ness J, Villalobos A, Welch M. Engineering genes for predictable protein expression. Protein Expr Purif 2012; 83:37-46. [PMID: 22425659 DOI: 10.1016/j.pep.2012.02.013] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2011] [Revised: 02/27/2012] [Accepted: 02/28/2012] [Indexed: 10/28/2022]
Abstract
The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.
Collapse
|
28
|
New autocorrelation QTMS-based descriptors for use in QSAM of peptides. JOURNAL OF THE IRANIAN CHEMICAL SOCIETY 2012. [DOI: 10.1007/s13738-012-0070-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol 2012; 30:271-7. [PMID: 22371084 PMCID: PMC3297981 DOI: 10.1038/nbt.2137] [Citation(s) in RCA: 509] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Accepted: 01/20/2012] [Indexed: 01/22/2023]
Abstract
Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-β enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.
Collapse
|
30
|
QSAR Study on Insect Neuropeptide Potencies Based on a Novel Set of Parameters of Amino Acids by Using OSC-PLS Method. Int J Pept Res Ther 2011. [DOI: 10.1007/s10989-011-9258-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
31
|
Ebalunode JO, Jagun C, Zheng W. Informatics approach to the rational design of siRNA libraries. Methods Mol Biol 2011; 672:341-58. [PMID: 20838976 DOI: 10.1007/978-1-60761-839-3_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
This chapter surveys the literature for state-of-the-art methods for the rational design of siRNA libraries. It identifies and presents major milestones in the field of computational modeling of siRNA's gene silencing efficacy. Commonly used features of siRNAs are summarized along with major machine learning techniques employed to build the predictive models. It has also outlined several web-enabled siRNA design tools. To face the challenge of modeling and rational design of chemically modified siRNAs, it also proposes a new cheminformatics approach for the representation and characterization of siRNA molecules. Some preliminary results with this new approach are presented to demonstrate the promising potential of this method for the modeling of siRNA's efficacy. Together with novel delivery technologies and chemical modification techniques, rational siRNA design algorithms will ultimately contribute to chemical biology research and the efficient development of siRNA therapeutics.
Collapse
Affiliation(s)
- Jerry O Ebalunode
- Department of Pharmaceutical Sciences, BRITE Institute, North Carolina Central University, Durham, NC, USA
| | | | | |
Collapse
|
32
|
Ebalunode JO, Zheng W. Cheminformatics Approach to Gene Silencing: Z Descriptors of Nucleotides and SVM Regression Afford Predictive Models for siRNA Potency. Mol Inform 2010; 29:871-81. [PMID: 27464351 DOI: 10.1002/minf.201000091] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2010] [Accepted: 11/07/2010] [Indexed: 01/01/2023]
Abstract
Short interfering RNA mediated gene silencing technology has been through tremendous development over the past decade, and has found broad applications in both basic biomedical research and pharmaceutical development. Critical to the effective use of this technology is the development of reliable algorithms to predict the potency and selectivity of siRNAs under study. Existing algorithms are mostly built upon sequence information of siRNAs and then employ statistical pattern recognition or machine learning techniques to derive rules or models. However, sequence-based features have limited ability to characterize siRNAs, especially chemically modified ones. In this study, we proposed a cheminformatics approach to describe siRNAs. Principal component scores (z1, z2, z3, z4) have been derived for each of the 5 nucleotides (A, U, G, C, T) from the descriptor matrix computed by the MOE program. Descriptors of a given siRNA sequence are simply the concatenation of the z values of its composing nucleotides. Thus, for each of the 2431 siRNA sequences in the Huesken dataset, 76 descriptors were generated for the 19-NT representation, and 84 descriptors were generated for the 21-NT representation of siRNAs. Support Vector Machine regression (SVMR) was employed to develop predictive models. In all cases, the models achieved Pearson correlation coefficient r and R about 0.84 and 0.65 for the training sets and test sets, respectively. A minimum of 25 % of the whole dataset was needed to obtain predictive models that could accurately predict 75 % of the remaining siRNAs. Thus, for the first time, a cheminformatics approach has been developed to successfully model the structure-potency relationship in siRNA-based gene silencing data, which has laid a solid foundation for quantitative modeling of chemically modified siRNAs.
Collapse
Affiliation(s)
- Jerry O Ebalunode
- Department of Pharmaceutical Sciences and BRITE Institute, North Carolina Central University, 1801 Fayetteville Street, Durham, NC 27707, USA tel: (+1) 919-530-6652; fax: (+1) 919-530-6600
| | - Weifan Zheng
- Department of Pharmaceutical Sciences and BRITE Institute, North Carolina Central University, 1801 Fayetteville Street, Durham, NC 27707, USA tel: (+1) 919-530-6652; fax: (+1) 919-530-6600.
| |
Collapse
|
33
|
Tian F, Zhang C, Fan X, Yang X, Wang X, Liang H. Predicting the Flexibility Profile of Ribosomal RNAs. Mol Inform 2010; 29:707-15. [PMID: 27464014 DOI: 10.1002/minf.201000092] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 09/28/2010] [Indexed: 11/06/2022]
Abstract
Flexibility in biomolecules is an important determinant of biological functionality, which can be measured quantitatively by atomic Debye-Waller factor or B-factor. Although numerous works have been addressed on theoretical and computational studies of the B-factor profiles of proteins, the methods used for predicting B-factor values of nucleic acids, especially the complicated ribosomal RNAs (rRNAs), which are very functionally similar to proteins in providing matrix structures and in catalyzing biochemical reactions, still remain unexploited. In this article, we present a quantitative structure-flexibility relationship (QSFR) study with the aim at the quantitative prediction of rRNA B-factor based on primary sequences (sequence-based) and advanced structures (structure-based) by using both linear and nonlinear machine learning approaches, including partial least squares regression (PLS), least squares support vector machine (LSSVM), and Gaussian process (GP). By rigorously examining the performance and reliability of constructed statistical models and by comparing our models in detail to those developed previously for protein B-factors, we demonstrate that (i) rRNA B-factors could be predicted at a similar level of accuracy with that of protein, (ii) a structure-based approach performed much better as compared to sequence-based methods in modeling of rRNA B-factors, and (iii) rRNA flexibility is primarily governed by the local features of nonbonding potential landscapes, such as electrostatic and van der Waals forces.
Collapse
Affiliation(s)
- Feifei Tian
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404.,College of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Chun Zhang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Xia Fan
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Xue Yang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Xi Wang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Huaping Liang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404.
| |
Collapse
|
34
|
Maertens J, Vanrolleghem PA. Modeling with a view to target identification in metabolic engineering: a critical evaluation of the available tools. Biotechnol Prog 2010; 26:313-31. [PMID: 20052739 DOI: 10.1002/btpr.349] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The state of the art tools for modeling metabolism, typically used in the domain of metabolic engineering, were reviewed. The tools considered are stoichiometric network analysis (elementary modes and extreme pathways), stoichiometric modeling (metabolic flux analysis, flux balance analysis, and carbon modeling), mechanistic and approximative modeling, cybernetic modeling, and multivariate statistics. In the context of metabolic engineering, one should be aware that the usefulness of these tools to optimize microbial metabolism for overproducing a target compound depends predominantly on the characteristic properties of that compound. Because of their shortcomings not all tools are suitable for every kind of optimization; issues like the dependence of the target compound's synthesis on severe (redox) constraints, the characteristics of its formation pathway, and the achievable/desired flux towards the target compound should play a role when choosing the optimization strategy.
Collapse
Affiliation(s)
- Jo Maertens
- BIOMATH, Dept. of Applied Mathematics, Biometrics, and Process Control, Ghent University, Ghent 9000, Belgium.
| | | |
Collapse
|
35
|
Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 2009; 38:199-212. [DOI: 10.1007/s00726-008-0228-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 12/18/2008] [Indexed: 10/21/2022]
|
36
|
Liang G, Li Z. Scores of generalized base properties for quantitative sequence-activity modelings for E. coli promoters based on support vector machine. J Mol Graph Model 2007; 26:269-81. [PMID: 17291800 DOI: 10.1016/j.jmgm.2006.12.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 11/18/2006] [Accepted: 12/10/2006] [Indexed: 10/23/2022]
Abstract
A novel base sequence representation technique, namely SGBP (scores of generalized base properties), was derived from principal component analysis of a matrix of 1209 property parameters including 0D, 1D, 2D and 3D information for five bases such as A, C, G, T and U. It was then employed to represent sequence structures of E. coli promoters. Variables which were used as inputs of partial least square (PLS) and support vector machine (SVM) were selected by genetic arithmetic-partial least square. All samples were divided into train set which was applied to develop quantitative sequence-activity modelings (QSAMs) and test set which was used to validate the predictive power of the resulting models according to D-optimal design. Investigation on QSAM by PLS showed properties of base of position -42, -34, -31, -33, -41, -46 and -29 may yield more influence on strengths, which has thus pointed us further into the direction of strong promoters. Parameters of SVM were determined by response surface methodology. Satisfactory results indicated that the simulative and the predictive abilities for the internal and external samples of QSAM by SVM were better than those of PLS. Those results showed that SGBP is a useful structural representation methodology in QSAMs due to its many advantages including plentiful structural information, easy manipulation, and high characterization competence. Moreover, SGBP-GA-SVM route for sequences design and activities prediction of DNA or RNA can further be applied.
Collapse
Affiliation(s)
- Guizhao Liang
- College of Bioengineering, Chongqing University, Chongqing 400030, PR China
| | | |
Collapse
|
37
|
Liao J, Warmuth MK, Govindarajan S, Ness JE, Wang RP, Gustafsson C, Minshull J. Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol 2007; 7:16. [PMID: 17386103 PMCID: PMC1847811 DOI: 10.1186/1472-6750-7-16] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2006] [Accepted: 03/26/2007] [Indexed: 11/10/2022] Open
Abstract
Background Altering a protein's function by changing its sequence allows natural proteins to be converted into useful molecular tools. Current protein engineering methods are limited by a lack of high throughput physical or computational tests that can accurately predict protein activity under conditions relevant to its final application. Here we describe a new synthetic biology approach to protein engineering that avoids these limitations by combining high throughput gene synthesis with machine learning-based design algorithms. Results We selected 24 amino acid substitutions to make in proteinase K from alignments of homologous sequences. We then designed and synthesized 59 specific proteinase K variants containing different combinations of the selected substitutions. The 59 variants were tested for their ability to hydrolyze a tetrapeptide substrate after the enzyme was first heated to 68°C for 5 minutes. Sequence and activity data was analyzed using machine learning algorithms. This analysis was used to design a new set of variants predicted to have increased activity over the training set, that were then synthesized and tested. By performing two cycles of machine learning analysis and variant design we obtained 20-fold improved proteinase K variants while only testing a total of 95 variant enzymes. Conclusion The number of protein variants that must be tested to obtain significant functional improvements determines the type of tests that can be performed. Protein engineers wishing to modify the property of a protein to shrink tumours or catalyze chemical reactions under industrial conditions have until now been forced to accept high throughput surrogate screens to measure protein properties that they hope will correlate with the functionalities that they intend to modify. By reducing the number of variants that must be tested to fewer than 100, machine learning algorithms make it possible to use more complex and expensive tests so that only protein properties that are directly relevant to the desired application need to be measured. Protein design algorithms that only require the testing of a small number of variants represent a significant step towards a generic, resource-optimized protein engineering process.
Collapse
Affiliation(s)
- Jun Liao
- Department of Computer Science, University of California, Santa Cruz, CA 95064 USA
| | - Manfred K Warmuth
- Department of Computer Science, University of California, Santa Cruz, CA 95064 USA
| | | | - Jon E Ness
- DNA 2.0, 1430 O'Brien Drive, Suite E, Menlo Park, CA 94025, USA
| | - Rebecca P Wang
- DNA 2.0, 1430 O'Brien Drive, Suite E, Menlo Park, CA 94025, USA
| | | | - Jeremy Minshull
- DNA 2.0, 1430 O'Brien Drive, Suite E, Menlo Park, CA 94025, USA
| |
Collapse
|
38
|
A new descriptor of amino acids based on the three-dimensional vector of atomic interaction field. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/s11434-006-0524-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
39
|
Mao PL, Liu TF, Kueh K, Wu P. Predicting the efficiency of UAG translational stop signal through studies of physicochemical properties of its composite mono- and dinucleotides. Comput Biol Chem 2005; 28:245-56. [PMID: 15548451 DOI: 10.1016/j.compbiolchem.2004.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2004] [Revised: 05/27/2004] [Accepted: 05/29/2004] [Indexed: 12/01/2022]
Abstract
In this study, we explored the problem of predicting the UAG stop-codon read-through efficiency. The reported nucleotide sequences were first converted into physicochemical property vectors before being presented to a machine learning algorithm. Two sets of physicochemical properties were applied: one for mononucleosides (in terms of steric bulk, hydrophobicity and electronics) and another for dinucleotides. To the best of our knowledge, this is the first report of how dinucleotides are converted into principle components derived from NMR chemical shift data. A few efficiency prediction models were then derived and a comparison between mononucleoside and dinucleotide-based models was shown. In the derived models, the coefficients of these property based predictors lend themselves to bio-physical interpretations, an advantage which is demonstrated in this study via a prediction model based on the steric bulk factor. Although it is quite simple, the steric bulk factor model explained well the effect of sequence variations surrounding the amber stop codon and the tRNA bearing UCCU anticodon. We further proposed new alternatives at position -1 and +4 of a UAG stop codon sequence to enhance the readthrough efficiency. This research may contribute to a better understanding of the readthrough mechanisms and may also help to study the normal translation termination process.
Collapse
Affiliation(s)
- Pei-Lin Mao
- Institute of Bioengineering and Nanotechnology, 51 Science Park Road, #01-01/10, The Aries, Singapore 117586, Singapore
| | | | | | | |
Collapse
|
40
|
Minshull J, Govindarajan S, Cox T, Ness JE, Gustafsson C. Engineered protein function by selective amino acid diversification. Methods 2005; 32:416-27. [PMID: 15003604 DOI: 10.1016/j.ymeth.2003.10.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/06/2003] [Indexed: 11/16/2022] Open
Abstract
Almost all protein engineering methods rely upon making changes to naturally occurring proteins that already possess some of the desired properties. This will probably remain the case as long as we lack a complete understanding of the way that an amino acid sequence gives rise to a protein with a precisely defined biological function. Common to all methods for altering an existing protein is the selection of a subset of amino acids in the protein for variation and a choice of which substitutions to make at each position. Variants are then tested empirically and further variants are created based upon their performance. Differences between protein engineering methods are the ways in which amino acids are chosen for variation, the protocols followed for creating the variants, and how information regarding variant properties is used in creating subsequent variants. In this article, we describe these differences and provide examples of how the experimental parameters of specific projects determine which method is most suitable.
Collapse
|
41
|
Gustafsson C, Govindarajan S, Minshull J. Putting engineering back into protein engineering: bioinformatic approaches to catalyst design. Curr Opin Biotechnol 2003; 14:366-70. [PMID: 12943844 DOI: 10.1016/s0958-1669(03)00101-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Complex multivariate engineering problems are commonplace and not unique to protein engineering. Mathematical and data-mining tools developed in other fields of engineering have now been applied to analyze sequence-activity relationships of peptides and proteins and to assist in the design of proteins and peptides with specified properties. Decreasing costs of DNA sequencing in conjunction with methods to quickly synthesize statistically representative sets of proteins allow modern heuristic statistics to be applied to protein engineering. This provides an alternative approach to expensive assays or unreliable high-throughput surrogate screens.
Collapse
|
42
|
Gustafsson C, Govindarajan S, Emig R. Exploration of sequence space for protein engineering. J Mol Recognit 2001; 14:308-14. [PMID: 11746951 DOI: 10.1002/jmr.543] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The process of protein engineering is currently evolving towards a heuristic understanding of the sequence-function relationship. Improved DNA sequencing capacity, efficient protein function characterization and improved quality of data points in conjunction with well-established statistical tools from other industries are changing the protein engineering field. Algorithms capturing the heuristic sequence-function relationships will have a drastic impact on the field of protein engineering. In this review, several alternative approaches to quantitatively assess sequence space are discussed and the relatively few examples of wet-lab validation of statistical sequence-function characterization/correlation are described.
Collapse
Affiliation(s)
- C Gustafsson
- Maxygen Inc., Galveston Drive 515, Redwood City, CA 94063, USA.
| | | | | |
Collapse
|
43
|
Ponomarenko JV, Furman DP, Frolov AS, Podkolodny NL, Orlova GV, Ponomarenko MP, Kolchanov NA, Sarai A. ACTIVITY: a database on DNA/RNA sites activity adapted to apply sequence-activity relationships from one system to another. Nucleic Acids Res 2001; 29:284-7. [PMID: 11125114 PMCID: PMC29829 DOI: 10.1093/nar/29.1.284] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
ACTIVITY is a database on DNA/RNA site sequences with known activity magnitudes, measurement systems, sequence-activity relationships under fixed experimental conditions and procedures to adapt these relationships from one measurement system to another. This database deposits information on DNA/RNA affinities to proteins and cell nuclear extracts, cutting efficiencies, gene transcription activity, mRNA translation efficiencies, mutability and other biological activities of natural sites occurring within promoters, mRNA leaders, and other regulatory regions in pro- and eukaryotic genomes, their mutant forms and synthetic analogues. Since activity magnitudes are heavily system-dependent, the current version of ACTIVITY is supplemented by three novel sub-databases: (i) SYSTEM, measurement systems; (ii) KNOWLEDGE, sequence-activity relationships under fixed experimental conditions; and (iii) CROSS_TEST, procedures adapting a relationship from one measurement system to another. These databases are useful in molecular biology, pharmacogenetics, metabolic engineering, drug design and biotechnology. The databases can be queried using SRS and are available through the Web, http://wwwmgs. bionet.nsc.ru/systems/Activity/.
Collapse
Affiliation(s)
- J V Ponomarenko
- Institute of Cytology and Genetics, 10 Lavrentyev Avenue, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem 1998; 41:2481-91. [PMID: 9651153 DOI: 10.1021/jm9700575] [Citation(s) in RCA: 461] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
In this study 87 amino acids (AA.s) have been characterized by 26 physicochemical descriptor variables. These descriptor variables include experimentally determined retention values in seven thin-layer chromatography (TLC) systems, three nuclear magnetic resonance (NMR) shift variables, and 16 calculated variables, namely six semiempirical molecular orbital indices, total, polar, and nonpolar surface area, van der Waals volume of the side chain, log P, molecular weight, and four indicator variables describing hydrogen bond donor and acceptor properties, and side chain charge. In the present study, the data from a previous characterization of 55 AA.s from our laboratory have been extended with data for 32 additional AA.s and 14 new descriptor variables. The new 32 AA.s were selected to represent both intermediate and more extreme physicochemical properties, compared to the 20 coded AA.s. The new extended and updated principal property scales, the z-scales, were calculated and aligned to previously reported z(old)-scales. The appropriateness of the extended z-scales were validated by the use in quantitative sequence-activity modeling (QSAM) of 89 elastase substrate analogues and in a QSAM of 29 neurotensin analogues.
Collapse
Affiliation(s)
- M Sandberg
- Research Group for Chemometrics, Department of Organic Chemistry, Umeâ University, S-901 87 Umeâ, Sweden
| | | | | | | | | |
Collapse
|
45
|
Ponomarenko MP, Kolchanova AN, Kolchanov NA. Generating programs for predicting the activity of functional sites. J Comput Biol 1997; 4:83-90. [PMID: 9109039 DOI: 10.1089/cmb.1997.4.83] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The computer system ACTIVITY is intended for generating programs with which to predict the activity of functional sites by nucleotide sequences. ACTIVITY analyzes a basis set of nucleotide sequences with known activity. The novelty of this approach is that Zadeh's fuzzy logic and decision-making theory have been employed for determining the best "sequence-->activity" regression. The best one thus determined is then transformed into the text of a program with which the activity for any nucleotide sequence is to be predicted. Testing with independent data has proved this prediction reliable. We have compared our approach with the two commonly used on identical data sets to find the ACTIVITY-generated programs quite competitive.
Collapse
Affiliation(s)
- M P Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Novosibirsk, Russia.
| | | | | |
Collapse
|
46
|
Wieslander A, Rilfors L, Dahlqvist A, Jonsson J, Hellberg S, Rännar S, Sjöström M, Lindblom G. Similar regulatory mechanisms despite differences in membrane lipid composition in Acholeplasma laidlawii strains A-EF22 and B-PG9. A multivariate data analysis. BIOCHIMICA ET BIOPHYSICA ACTA 1994; 1191:331-42. [PMID: 8172919 DOI: 10.1016/0005-2736(94)90184-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Mycoplasmas are small, cell wall-deficient bacteria. The metabolic regulation of the lipid composition in the membrane of the species Acholeplasma laidlawii, strains A-EF22 and B-JU, is governed mainly by the balance between the potential formation of lamellar and nonlamellar phase structures. However, the regulatory features have not been consistently observed in the B-PG9 strain. A comparison has been performed between the membrane lipid composition for strains A-EF22 and B-PG9, simultaneously changing eight experimental conditions known to affect the regulation and packing properties of the A-EF22 lipids. Multiple regression and partial least-square discriminant analyses of many variables showed: (i) quantitative differences in membrane lipid and protein composition, and in membrane protein molecular masses of the two strains; (ii) different molar fractions of the major polar lipids monoglucosyldiacylglycerol (nonlamellar) and diglucosyldiacylglycerol (lamellar), which were caused by differences in lipid acyl chain length and unsaturation inherent in the strains and by the type of growth medium used; and (iii) similar regulatory mechanisms for changes in the lipid composition under most conditions, responding to the experimentally varied bilayer and nonbilayer properties of the lipid matrix. These regulatory principles are probably valid in other bacteria as well.
Collapse
Affiliation(s)
- A Wieslander
- Department of Biochemistry, University of Umeå, Sweden
| | | | | | | | | | | | | | | |
Collapse
|