1
|
Nguyen TD, Saito Y, Kameda T. CodonAdjust: a software for in silico design of a mutagenesis library with specific amino acid profiles. Protein Eng Des Sel 2020; 32:503-511. [PMID: 32705123 DOI: 10.1093/protein/gzaa013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 03/27/2020] [Accepted: 06/19/2020] [Indexed: 11/12/2022] Open
Abstract
In protein engineering, generation of mutagenesis libraries is a key step to study the functions of mutants. To generate mutants with a desired composition of amino acids (AAs), a codon consisting of a mixture of nucleotides is widely applied. Several computational methods have been proposed to calculate a codon nucleotide composition for generating a given amino acid profile based on mathematical optimization. However, these previous methods need to manually tune weights of amino acids in objective functions, which are time-consuming and, more importantly, lack publicly available software implementations. Here, we develop CodonAdjust, a software to adjust a codon nucleotide composition for mimicking a given amino acid profile. We propose different options of CodonAdjust, which provide various customizations in practical scenarios such as setting a guaranteeing threshold for the frequencies of amino acids without any manual tasks. We demonstrate the capability of CodonAdjust in the experiments on the complementarity-determining regions (CDRs) of antibodies and T-cell receptors (TCRs) as well as millions of amino acid profiles from Pfam. These results suggest that CodonAdjust is a productive software for codon design and may accelerate library generation. CodonAdjust is freely available at https://github.com/tiffany-nguyen/CodonAdjust. Paper edited by Dr. Jeffery Saven, Board Member for PEDS.
Collapse
Affiliation(s)
- Thuy Duong Nguyen
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
2
|
Suchsland R, Appel B, Müller S. Preparation of trinucleotide phosphoramidites as synthons for the synthesis of gene libraries. Beilstein J Org Chem 2018. [PMID: 29520304 PMCID: PMC5827815 DOI: 10.3762/bjoc.14.28] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The preparation of protein libraries is a key issue in protein engineering and biotechnology. Such libraries can be prepared by a variety of methods, starting from the respective gene library. The challenge in gene library preparation is to achieve controlled total or partial randomization at any predefined number and position of codons of a given gene, in order to obtain a library with a maximum number of potentially successful candidates. This purpose is best achieved by the usage of trinucleotide synthons for codon-based gene synthesis. We here review the strategies for the preparation of fully protected trinucleotides, emphasizing more recent developments for their synthesis on solid phase and on soluble polymers, and their use as synthons in standard DNA synthesis.
Collapse
Affiliation(s)
- Ruth Suchsland
- Institut für Biochemie, Ernst-Moritz-Arndt-Universität Greifswald, Felix-Hausdorff-Str. 4, D-17489 Greifswald, Germany
| | - Bettina Appel
- Institut für Biochemie, Ernst-Moritz-Arndt-Universität Greifswald, Felix-Hausdorff-Str. 4, D-17489 Greifswald, Germany
| | - Sabine Müller
- Institut für Biochemie, Ernst-Moritz-Arndt-Universität Greifswald, Felix-Hausdorff-Str. 4, D-17489 Greifswald, Germany
| |
Collapse
|
3
|
Jacobs TM, Yumerefendi H, Kuhlman B, Leaver-Fay A. SwiftLib: rapid degenerate-codon-library optimization through dynamic programming. Nucleic Acids Res 2014; 43:e34. [PMID: 25539925 PMCID: PMC4357694 DOI: 10.1093/nar/gku1323] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second.
Collapse
Affiliation(s)
- Timothy M Jacobs
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hayretin Yumerefendi
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Brian Kuhlman
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Andrew Leaver-Fay
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
4
|
Hidalgo A, Schließmann A, Bornscheuer UT. One-pot Simple methodology for CAssette Randomization and Recombination for focused directed evolution (OSCARR). Methods Mol Biol 2014; 1179:207-212. [PMID: 25055780 DOI: 10.1007/978-1-4939-1053-3_14] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The OSCARR methodology (One-pot Simple methodology for CAssette Randomization and Recombination) bridges the gap between site-directed mutagenesis and full randomization by making use of carefully designed mutagenic cassettes and an optimized one-pot megaprimer PCR. The method is especially suited to construct libraries of up to ten randomized codons for focused directed evolution, exhibits up to 97 % efficiency in the amplification of mutated over wild-type products, and is sufficiently versatile to allow mutagenesis and recombination of several cassettes within the same gene.
Collapse
|
5
|
Probabilistic methods in directed evolution: library size, mutation rate, and diversity. Methods Mol Biol 2014; 1179:261-78. [PMID: 25055784 DOI: 10.1007/978-1-4939-1053-3_18] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Directed evolution has emerged as an important tool for engineering proteins with improved or novel properties. Because of their inherent reliance on randomness, directed evolution protocols are amenable to probabilistic modeling and analysis. This chapter summarizes and reviews in a nonmathematical way some of the probabilistic works related to directed evolution, with particular focus on three of the most widely used methods: saturation mutagenesis, error-prone PCR, and in vitro recombination. The ultimate aim is to provide the reader with practical information to guide the planning and design of directed evolution studies. Importantly, the applications and locations of freely available computational resources to assist with this process are described in detail.
Collapse
|
6
|
Optimal codon randomization via mathematical programming. J Theor Biol 2013; 335:147-52. [PMID: 23792109 DOI: 10.1016/j.jtbi.2013.05.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 05/28/2013] [Indexed: 01/21/2023]
Abstract
Codon randomization via degenerate oligonucleotides is a widely used approach for generating protein libraries. We use integer programming methodology to model and solve the problem of computing the minimal mixture of oligonucleotides required to induce an arbitrary target probability over the 20 standard amino acids. We consider both randomization via conventional degenerate oligonucleotides, which incorporate at each position of the randomized codon certain nucleotides in equal probabilities, and randomization via spiked oligonucleotides, which admit arbitrary nucleotide distribution at each of the codon's positions. Existing methods for computing such mixtures rely on various heuristics.
Collapse
|
7
|
Arunachalam TS, Wichert C, Appel B, Müller S. Mixed oligonucleotides for random mutagenesis: best way of making them. Org Biomol Chem 2012; 10:4641-50. [PMID: 22552713 DOI: 10.1039/c2ob25328c] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The generation of proteins, especially enzymes, with pre-deliberated, novel properties is a big challenge in the field of protein engineering. This aim, over the years was critically facilitated by newly emerging methods of combinatorial and evolutionary techniques, such as combinatorial gene synthesis followed by functional screening of many structural variants generated in parallel (library). Libraries can be generated by a large number of available methods. Therein the use of mixtures of pre-formed trinucleotide blocks representing codons for the 20 canonical amino acids for oligonucleotide synthesis stands out as allowing fully controlled partial (or total) randomization individually at any number of arbitrarily chosen codon positions of a given gene. This has created substantial demand of fully protected trinucleotide synthons of good reactivity in standard oligonucleotide synthesis. We here review methods for the preparation of oligonucleotide mixtures with a strong focus on codon-specific trinucleotide blocks.
Collapse
Affiliation(s)
- Tamil Selvi Arunachalam
- Institut für Biochemie, Ernst Moritz Arndt Universität, Felix Hausdorff Strasse 4, Greifswald, D-17487, Germany
| | | | | | | |
Collapse
|
8
|
Wittrup Larsen M, Zielinska DF, Martinelle M, Hidalgo A, Jensen LJ, Bornscheuer UT, Hult K. Suppression of Water as a Nucleophile in Candida antarctica Lipase B Catalysis. Chembiochem 2010; 11:796-801. [DOI: 10.1002/cbic.200900743] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
9
|
Craig RA, Lu J, Luo J, Shi L, Liao L. Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm. Nucleic Acids Res 2009; 38:e10. [PMID: 19889723 PMCID: PMC2811015 DOI: 10.1093/nar/gkp906] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein libraries are essential to the field of protein engineering. Increasingly, probabilistic protein design is being used to synthesize combinatorial protein libraries, which allow the protein engineer to explore a vast space of amino acid sequences, while at the same time placing restrictions on the amino acid distributions. To this end, if site-specific amino acid probabilities are input as the target, then the codon nucleotide distributions that match this target distribution can be used to generate a partially randomized gene library. However, it turns out to be a highly nontrivial computational task to find the codon nucleotide distributions that exactly matches a given target distribution of amino acids. We first showed that for any given target distribution an exact solution may not exist at all. Formulated as a constrained optimization problem, we then developed a genetic algorithm-based approach to find codon nucleotide distributions that match as closely as possible to the target amino acid distribution. As compared with the previous gradient descent method on various objective functions, the new method consistently gave more optimized distributions as measured by the relative entropy between the calculated and the target distributions. To simulate the actual lab solutions, new objective functions were designed to allow for two separate sets of codons in seeking a better match to the target amino acid distribution.
Collapse
Affiliation(s)
- Roger A Craig
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | | | | | | | | |
Collapse
|
10
|
Hidalgo A, Schliessmann A, Molina R, Hermoso J, Bornscheuer UT. A one-pot, simple methodology for cassette randomisation and recombination for focused directed evolution. Protein Eng Des Sel 2008; 21:567-76. [PMID: 18559369 DOI: 10.1093/protein/gzn034] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Protein engineering is currently performed either by rational design, focusing in most cases on only a few positions modified by site-directed mutagenesis, or by directed molecular evolution, in which the entire protein-encoding gene is subjected to random mutagenesis followed by screening or selection of desired phenotypes. A novel alternative is focused directed evolution, in which only fragments of a protein are randomised while the overall scaffold of a protein remains unchanged. For this purpose, we developed a PCR technique using long, spiked oligonucleotides, which allow randomising of one or several cassettes in any given position of a gene. This method allows over 95% incorporation of mutations independently of their position within the gene, yielding sufficient product to generate large libraries, and the possibility of simultaneously randomising more than one locus at a time, thus originating recombination. The high efficiency of this method was verified by creating focused mutant libraries of Pseudomonas fluorescens esterase I (PFEI), screening for altered substrate selectivity and validating against libraries created by error-prone PCR. This led to the identification of two mutants within the OSCARR library with a 10-fold higher catalytic efficiency towards p-nitrophenyl dodecanoate. These PFEI variants were also modelled in order to explain the observed effects.
Collapse
Affiliation(s)
- Aurelio Hidalgo
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, Ernst-Moritz-Arndt University Greifswald, Felix-Hausdorff-Str. 4, D-17487 Greifswald, Germany
| | | | | | | | | |
Collapse
|
11
|
Volles MJ, Lansbury PT. A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries. Nucleic Acids Res 2005; 33:3667-77. [PMID: 15990391 PMCID: PMC1166583 DOI: 10.1093/nar/gki669] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A computer program for the generation and analysis of in silico random point mutagenesis libraries is described. The program operates by mutagenizing an input nucleic acid sequence according to mutation parameters specified by the user for each sequence position and type of point mutation. The program can mimic almost any type of random mutagenesis library, including those produced via error-prone PCR (ep-PCR), mutator Escherichia coli strains, chemical mutagenesis, and doped or random oligonucleotide synthesis. The program analyzes the generated nucleic acid sequences and/or the associated protein library to produce several estimates of library diversity (number of unique sequences, point mutations, and single point mutants) and the rate of saturation of these diversities during experimental screening or selection of clones. This information allows one to select the optimal screen size for a given mutagenesis library, necessary to efficiently obtain a certain coverage of the sequence-space. The program also reports the abundance of each specific protein mutation at each sequence position, which is useful as a measure of the level and type of mutation bias in the library. Alternatively, one can use the program to evaluate the relative merits of preexisting libraries, or to examine various hypothetical mutation schemes to determine the optimal method for creating a library that serves the screen/selection of interest. Simulated libraries of at least 109 sequences are accessible by the numerical algorithm with currently available personal computers; an analytical algorithm is also available which can rapidly calculate a subset of the numerical statistics in libraries of arbitrarily large size. A multi-type double-strand stochastic model of ep-PCR is developed in an appendix to demonstrate the applicability of the algorithm to amplifying mutagenesis procedures. Estimators of DNA polymerase mutation-type-specific error rates are derived using the model. Analyses of an alpha-synuclein ep-PCR library and NNS synthetic oligonucleotide libraries are given as examples.
Collapse
Affiliation(s)
- Michael J Volles
- Center for Neurologic Diseases, Brigham and Women's Hospital and Department of Neurology, Harvard Medical School 65 Landsdowne Street, Cambridge, MA 02139, USA.
| | | |
Collapse
|
12
|
Park S, Kono H, Wang W, Boder ET, Saven JG. Progress in the development and application of computational methods for probabilistic protein design. Comput Chem Eng 2005. [DOI: 10.1016/j.compchemeng.2004.07.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
13
|
Wang W, Saven JG. Designing gene libraries from protein profiles for combinatorial protein experiments. Nucleic Acids Res 2002; 30:e120. [PMID: 12409479 PMCID: PMC135844 DOI: 10.1093/nar/gnf119] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein combinatorial libraries provide new ways to probe the determinants of folding and to discover novel proteins. Such libraries are often constructed by expressing an ensemble of partially random gene sequences. Given the intractably large number of possible sequences, some limitation on diversity must be imposed. A non-uniform distribution of nucleotides can be used to reduce the number of possible sequences and encode peptide sequences having a predetermined set of amino acid probabilities at each residue position, i.e., the amino acid sequence profile. Such profiles can be determined by inspection, multiple sequence alignment or physically-based computational methods. Here we present a computational method that takes as input a desired sequence profile and calculates the individual nucleotide probabilities among partially random genes. The calculated gene library can be readily used in the context of standard DNA synthesis to generate a protein library with essentially the desired profile. The fidelity between the desired profile and the calculated one coded by these partially random genes is quantitatively evaluated using the linear correlation coefficient and a relative entropy, each of which provides a measure of profile agreement at each position of the sequence. On average, this method of identifying such codon frequencies performs as well or better than other methods with regard to fidelity to the original profile. Importantly, the method presented here provides much better yields of complete sequences that do not contain stop codons, a feature that is particularly important when all or large fractions of a gene are subject to combinatorial mutation.
Collapse
Affiliation(s)
- Wei Wang
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323, USA
| | | |
Collapse
|
14
|
Affiliation(s)
- S Brakmann
- Max Planck Institute for Biophysical Chemistry, Am Fassberg, 37077 Göttingen, Germany.
| |
Collapse
|
15
|
Abstract
The serine protease subtilisin is an important industrial enzyme as well as a model for understanding the enormous rate enhancements affected by enzymes. For these reasons along with the timely cloning of the gene, ease of expression and purification and availability of atomic resolution structures, subtilisin became a model system for protein engineering studies in the 1980s. Fifteen years later, mutations in well over 50% of the 275 amino acids of subtilisin have been reported in the scientific literature. Most subtilisin engineering has involved catalytic amino acids, substrate binding regions and stabilizing mutations. Stability has been the property of subtilisin which has been most amenable to enhancement, yet perhaps least understood. This review will give a brief overview of the subtilisin engineering field, critically review what has been learned about subtilisin stability from protein engineering experiments and conclude with some speculation about the prospects for future subtilisin engineering.
Collapse
Affiliation(s)
- P N Bryan
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, 9600 Gudelsky Drive, 20850, Rockville, MD, USA.
| |
Collapse
|