1
|
Nguyen TD, Saito Y, Kameda T. CodonAdjust: a software for in silico design of a mutagenesis library with specific amino acid profiles. Protein Eng Des Sel 2020; 32:503-511. [PMID: 32705123 DOI: 10.1093/protein/gzaa013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 03/27/2020] [Accepted: 06/19/2020] [Indexed: 11/12/2022] Open
Abstract
In protein engineering, generation of mutagenesis libraries is a key step to study the functions of mutants. To generate mutants with a desired composition of amino acids (AAs), a codon consisting of a mixture of nucleotides is widely applied. Several computational methods have been proposed to calculate a codon nucleotide composition for generating a given amino acid profile based on mathematical optimization. However, these previous methods need to manually tune weights of amino acids in objective functions, which are time-consuming and, more importantly, lack publicly available software implementations. Here, we develop CodonAdjust, a software to adjust a codon nucleotide composition for mimicking a given amino acid profile. We propose different options of CodonAdjust, which provide various customizations in practical scenarios such as setting a guaranteeing threshold for the frequencies of amino acids without any manual tasks. We demonstrate the capability of CodonAdjust in the experiments on the complementarity-determining regions (CDRs) of antibodies and T-cell receptors (TCRs) as well as millions of amino acid profiles from Pfam. These results suggest that CodonAdjust is a productive software for codon design and may accelerate library generation. CodonAdjust is freely available at https://github.com/tiffany-nguyen/CodonAdjust. Paper edited by Dr. Jeffery Saven, Board Member for PEDS.
Collapse
Affiliation(s)
- Thuy Duong Nguyen
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
2
|
Suchsland R, Appel B, Müller S. Preparation of trinucleotide phosphoramidites as synthons for the synthesis of gene libraries. Beilstein J Org Chem 2018. [PMID: 29520304 PMCID: PMC5827815 DOI: 10.3762/bjoc.14.28] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The preparation of protein libraries is a key issue in protein engineering and biotechnology. Such libraries can be prepared by a variety of methods, starting from the respective gene library. The challenge in gene library preparation is to achieve controlled total or partial randomization at any predefined number and position of codons of a given gene, in order to obtain a library with a maximum number of potentially successful candidates. This purpose is best achieved by the usage of trinucleotide synthons for codon-based gene synthesis. We here review the strategies for the preparation of fully protected trinucleotides, emphasizing more recent developments for their synthesis on solid phase and on soluble polymers, and their use as synthons in standard DNA synthesis.
Collapse
Affiliation(s)
- Ruth Suchsland
- Institut für Biochemie, Ernst-Moritz-Arndt-Universität Greifswald, Felix-Hausdorff-Str. 4, D-17489 Greifswald, Germany
| | - Bettina Appel
- Institut für Biochemie, Ernst-Moritz-Arndt-Universität Greifswald, Felix-Hausdorff-Str. 4, D-17489 Greifswald, Germany
| | - Sabine Müller
- Institut für Biochemie, Ernst-Moritz-Arndt-Universität Greifswald, Felix-Hausdorff-Str. 4, D-17489 Greifswald, Germany
| |
Collapse
|
3
|
Probabilistic methods in directed evolution: library size, mutation rate, and diversity. Methods Mol Biol 2014; 1179:261-78. [PMID: 25055784 DOI: 10.1007/978-1-4939-1053-3_18] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Directed evolution has emerged as an important tool for engineering proteins with improved or novel properties. Because of their inherent reliance on randomness, directed evolution protocols are amenable to probabilistic modeling and analysis. This chapter summarizes and reviews in a nonmathematical way some of the probabilistic works related to directed evolution, with particular focus on three of the most widely used methods: saturation mutagenesis, error-prone PCR, and in vitro recombination. The ultimate aim is to provide the reader with practical information to guide the planning and design of directed evolution studies. Importantly, the applications and locations of freely available computational resources to assist with this process are described in detail.
Collapse
|
4
|
Optimal codon randomization via mathematical programming. J Theor Biol 2013; 335:147-52. [PMID: 23792109 DOI: 10.1016/j.jtbi.2013.05.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 05/28/2013] [Indexed: 01/21/2023]
Abstract
Codon randomization via degenerate oligonucleotides is a widely used approach for generating protein libraries. We use integer programming methodology to model and solve the problem of computing the minimal mixture of oligonucleotides required to induce an arbitrary target probability over the 20 standard amino acids. We consider both randomization via conventional degenerate oligonucleotides, which incorporate at each position of the randomized codon certain nucleotides in equal probabilities, and randomization via spiked oligonucleotides, which admit arbitrary nucleotide distribution at each of the codon's positions. Existing methods for computing such mixtures rely on various heuristics.
Collapse
|
5
|
Arunachalam TS, Wichert C, Appel B, Müller S. Mixed oligonucleotides for random mutagenesis: best way of making them. Org Biomol Chem 2012; 10:4641-50. [PMID: 22552713 DOI: 10.1039/c2ob25328c] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The generation of proteins, especially enzymes, with pre-deliberated, novel properties is a big challenge in the field of protein engineering. This aim, over the years was critically facilitated by newly emerging methods of combinatorial and evolutionary techniques, such as combinatorial gene synthesis followed by functional screening of many structural variants generated in parallel (library). Libraries can be generated by a large number of available methods. Therein the use of mixtures of pre-formed trinucleotide blocks representing codons for the 20 canonical amino acids for oligonucleotide synthesis stands out as allowing fully controlled partial (or total) randomization individually at any number of arbitrarily chosen codon positions of a given gene. This has created substantial demand of fully protected trinucleotide synthons of good reactivity in standard oligonucleotide synthesis. We here review methods for the preparation of oligonucleotide mixtures with a strong focus on codon-specific trinucleotide blocks.
Collapse
Affiliation(s)
- Tamil Selvi Arunachalam
- Institut für Biochemie, Ernst Moritz Arndt Universität, Felix Hausdorff Strasse 4, Greifswald, D-17487, Germany
| | | | | | | |
Collapse
|
6
|
Hidalgo A, Schliessmann A, Molina R, Hermoso J, Bornscheuer UT. A one-pot, simple methodology for cassette randomisation and recombination for focused directed evolution. Protein Eng Des Sel 2008; 21:567-76. [PMID: 18559369 DOI: 10.1093/protein/gzn034] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Protein engineering is currently performed either by rational design, focusing in most cases on only a few positions modified by site-directed mutagenesis, or by directed molecular evolution, in which the entire protein-encoding gene is subjected to random mutagenesis followed by screening or selection of desired phenotypes. A novel alternative is focused directed evolution, in which only fragments of a protein are randomised while the overall scaffold of a protein remains unchanged. For this purpose, we developed a PCR technique using long, spiked oligonucleotides, which allow randomising of one or several cassettes in any given position of a gene. This method allows over 95% incorporation of mutations independently of their position within the gene, yielding sufficient product to generate large libraries, and the possibility of simultaneously randomising more than one locus at a time, thus originating recombination. The high efficiency of this method was verified by creating focused mutant libraries of Pseudomonas fluorescens esterase I (PFEI), screening for altered substrate selectivity and validating against libraries created by error-prone PCR. This led to the identification of two mutants within the OSCARR library with a 10-fold higher catalytic efficiency towards p-nitrophenyl dodecanoate. These PFEI variants were also modelled in order to explain the observed effects.
Collapse
Affiliation(s)
- Aurelio Hidalgo
- Department of Biotechnology and Enzyme Catalysis, Institute of Biochemistry, Ernst-Moritz-Arndt University Greifswald, Felix-Hausdorff-Str. 4, D-17487 Greifswald, Germany
| | | | | | | | | |
Collapse
|
7
|
Volles MJ, Lansbury PT. A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries. Nucleic Acids Res 2005; 33:3667-77. [PMID: 15990391 PMCID: PMC1166583 DOI: 10.1093/nar/gki669] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A computer program for the generation and analysis of in silico random point mutagenesis libraries is described. The program operates by mutagenizing an input nucleic acid sequence according to mutation parameters specified by the user for each sequence position and type of point mutation. The program can mimic almost any type of random mutagenesis library, including those produced via error-prone PCR (ep-PCR), mutator Escherichia coli strains, chemical mutagenesis, and doped or random oligonucleotide synthesis. The program analyzes the generated nucleic acid sequences and/or the associated protein library to produce several estimates of library diversity (number of unique sequences, point mutations, and single point mutants) and the rate of saturation of these diversities during experimental screening or selection of clones. This information allows one to select the optimal screen size for a given mutagenesis library, necessary to efficiently obtain a certain coverage of the sequence-space. The program also reports the abundance of each specific protein mutation at each sequence position, which is useful as a measure of the level and type of mutation bias in the library. Alternatively, one can use the program to evaluate the relative merits of preexisting libraries, or to examine various hypothetical mutation schemes to determine the optimal method for creating a library that serves the screen/selection of interest. Simulated libraries of at least 109 sequences are accessible by the numerical algorithm with currently available personal computers; an analytical algorithm is also available which can rapidly calculate a subset of the numerical statistics in libraries of arbitrarily large size. A multi-type double-strand stochastic model of ep-PCR is developed in an appendix to demonstrate the applicability of the algorithm to amplifying mutagenesis procedures. Estimators of DNA polymerase mutation-type-specific error rates are derived using the model. Analyses of an alpha-synuclein ep-PCR library and NNS synthetic oligonucleotide libraries are given as examples.
Collapse
Affiliation(s)
- Michael J Volles
- Center for Neurologic Diseases, Brigham and Women's Hospital and Department of Neurology, Harvard Medical School 65 Landsdowne Street, Cambridge, MA 02139, USA.
| | | |
Collapse
|
8
|
Tabuchi I, Soramoto S, Ueno S, Husimi Y. Multi-line split DNA synthesis: a novel combinatorial method to make high quality peptide libraries. BMC Biotechnol 2004; 4:19. [PMID: 15341664 PMCID: PMC520752 DOI: 10.1186/1472-6750-4-19] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2004] [Accepted: 09/01/2004] [Indexed: 11/30/2022] Open
Abstract
Background We developed a method to make a various high quality random peptide libraries for evolutionary protein engineering based on a combinatorial DNA synthesis. Results A split synthesis in codon units was performed with mixtures of bases optimally designed by using a Genetic Algorithm program. It required only standard DNA synthetic reagents and standard DNA synthesizers in three lines. This multi-line split DNA synthesis (MLSDS) is simply realized by adding a mix-and-split process to normal DNA synthesis protocol. Superiority of MLSDS method over other methods was shown. We demonstrated the synthesis of oligonucleotide libraries with 1016 diversity, and the construction of a library with random sequence coding 120 amino acids containing few stop codons. Conclusions Owing to the flexibility of the MLSDS method, it will be able to design various "rational" libraries by using bioinformatics databases.
Collapse
Affiliation(s)
- Ichiro Tabuchi
- Tokyo Evolution Research Center, 1-1-45-504, Okubo, Shinjuku-ku, Tokyo 169-0072, Japan
- Department of Functional Materials Science, Saitama University,255 Shimo-Okubo, Saitama 338-8570, Japan
| | - Sayaka Soramoto
- Department of Functional Materials Science, Saitama University,255 Shimo-Okubo, Saitama 338-8570, Japan
| | - Shingo Ueno
- Department of Functional Materials Science, Saitama University,255 Shimo-Okubo, Saitama 338-8570, Japan
| | - Yuzuru Husimi
- Department of Functional Materials Science, Saitama University,255 Shimo-Okubo, Saitama 338-8570, Japan
| |
Collapse
|
9
|
Wirsching F, Keller M, Hildmann C, Riester D, Schwienhorst A. Directed evolution towards protease-resistant hirudin variants. Mol Genet Metab 2003; 80:451-62. [PMID: 14654359 DOI: 10.1016/j.ymgme.2003.09.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Hirudin, a thrombin-specific inhibitor, is efficiently digested and inactivated by proteases with pepsin- and chymotrypsin-like specificity. Using a combination of phage display selection and high-throughput screening methods, several variants of recombinant hirudin were generated. Only very few variants comprising amino acid substitutions in the amino-terminal domain (residues 1-5) and in the carboxyl-terminal tail (residues 49, 50, and/or 56, 57, 62-64) were identified that showed thrombin inhibition activities similar to those of the wild-type polypeptide. Analysis of protease susceptibility, however, revealed that mutations, which conferred protease resistance, simultaneously diminish thrombin inhibition activity. This is particularly apparent for substitutions in the region of residues 56-64, which forms a large number of electrostatic and hydrophobic interactions with thrombin in the crystal structure of the complex. Unlike wild-type hirudin, the variant comprising Pro(50)- ...-His(56)-Asp(57)- ...-Pro(62)-Pro(63)-His(64) is completely resistant to pepsin and chymotrypsin cleavage; however, this is at the expense of thrombin inhibition activity where there is a 100-fold increase in the IC50 value. The frequent replacement of wild-type amino acids by proline at major protease cleavage sites indicates that at least pepsin- and chymotrypsin-like enzymes may exhibit a (conformational) specificity concerning the P1 and P2 positions. On the basis of these results, proline substitutions appear to be a general strategy to design polypeptides that are not susceptible to digestion by a broader range of different proteases.
Collapse
Affiliation(s)
- Frank Wirsching
- Abteilung fuer Molekulare Genetik und Praeparative Molekularbiologie, Institut fuer Mikrobiologie und Genetik, Grisebachstr. 8, 37077, Goettingen, Germany
| | | | | | | | | |
Collapse
|
10
|
Wang W, Saven JG. Designing gene libraries from protein profiles for combinatorial protein experiments. Nucleic Acids Res 2002; 30:e120. [PMID: 12409479 PMCID: PMC135844 DOI: 10.1093/nar/gnf119] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein combinatorial libraries provide new ways to probe the determinants of folding and to discover novel proteins. Such libraries are often constructed by expressing an ensemble of partially random gene sequences. Given the intractably large number of possible sequences, some limitation on diversity must be imposed. A non-uniform distribution of nucleotides can be used to reduce the number of possible sequences and encode peptide sequences having a predetermined set of amino acid probabilities at each residue position, i.e., the amino acid sequence profile. Such profiles can be determined by inspection, multiple sequence alignment or physically-based computational methods. Here we present a computational method that takes as input a desired sequence profile and calculates the individual nucleotide probabilities among partially random genes. The calculated gene library can be readily used in the context of standard DNA synthesis to generate a protein library with essentially the desired profile. The fidelity between the desired profile and the calculated one coded by these partially random genes is quantitatively evaluated using the linear correlation coefficient and a relative entropy, each of which provides a measure of profile agreement at each position of the sequence. On average, this method of identifying such codon frequencies performs as well or better than other methods with regard to fidelity to the original profile. Importantly, the method presented here provides much better yields of complete sequences that do not contain stop codons, a feature that is particularly important when all or large fractions of a gene are subject to combinatorial mutation.
Collapse
Affiliation(s)
- Wei Wang
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323, USA
| | | |
Collapse
|
11
|
Kamphausen S, Höltge N, Wirsching F, Morys-Wortmann C, Riester D, Goetz R, Thürk M, Schwienhorst A. Genetic algorithm for the design of molecules with desired properties. J Comput Aided Mol Des 2002; 16:551-67. [PMID: 12602950 DOI: 10.1023/a:1021928016359] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The design of molecules with desired properties is still a challenge because of the largely unpredictable end results. Computational methods can be used to assist and speed up this process. In particular, genetic algorithms have proved to be powerful tools with a wide range of applications, e.g. in the field of drug development. Here, we propose a new genetic algorithm that has been tailored to meet the demands of de novo drug design, i.e. efficient optimization based on small training sets that are analyzed in only a small number of design cycles. The efficiency of the design algorithm was demonstrated in the context of several different applications. First, RNA molecules were optimized with respect to folding energy. Second, a spinglass was optimized as a model system for the optimization of multiletter alphabet biopolymers such as peptides. Finally, the feasibility of the computer-assisted molecular design approach was demonstrated for the de novo construction of peptidic thrombin inhibitors using an iterative process of 4 design cycles of computer-guided optimization. Synthesis and experimental fitness determination of only 600 different compounds from a virtual library of more than 10(17) molecules was necessary to achieve this goal.
Collapse
Affiliation(s)
- Stefan Kamphausen
- Abteilung fuer Molekulare Genetik und Praeparative Molekularbiologie, Institut für Mikrobiologie und Genetik, Grisebachstr. 8, 37077 Goettingen, Germany
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Affiliation(s)
- S Brakmann
- Max Planck Institute for Biophysical Chemistry, Am Fassberg, 37077 Göttingen, Germany.
| |
Collapse
|
13
|
Kappen C. Analysis of a complete homeobox gene repertoire: implications for the evolution of diversity. Proc Natl Acad Sci U S A 2000; 97:4481-6. [PMID: 10781048 PMCID: PMC18260 DOI: 10.1073/pnas.97.9.4481] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The completion of sequencing projects for various organisms has already advanced our insight into the evolution of entire genomes and the role of gene duplications. One multigene family that has served as a paradigm for the study of gene duplications and molecular evolution is the family of homeodomain-encoding genes. I present here an analysis of the homeodomain repertoire of an entire genome, that of the nematode Caenorhabditis elegans, in relation to our current knowledge of these genes in plants, arthropods, and mammals. A methodological framework is developed that proposes approaches for the analysis of homeodomain repertoires and multigene families in general.
Collapse
Affiliation(s)
- C Kappen
- S. C. Johnson Medical Research Center, Mayo Clinic Scottsdale, 13400 East Shea Boulevard, Scottsdale, AZ 85259, USA.
| |
Collapse
|
14
|
|
15
|
|
16
|
|
17
|
Jensen LJ, Andersen KV, Svendsen A, Kretzschmar T. Scoring functions for computational algorithms applicable to the design of spiked oligonucleotides. Nucleic Acids Res 1998; 26:697-702. [PMID: 9443959 PMCID: PMC147326 DOI: 10.1093/nar/26.3.697] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Protein engineering by inserting stretches of random DNA sequences into target genes in combination with adequate screening or selection methods is a versatile technique to elucidate and improve protein functions. Established compounds for generating semi-random DNA sequences are spiked oligonucleotides which are synthesised by interspersing wild type (wt) nucleotides of the target sequence with certain amounts of other nucleotides. Directed spiking strategies reduce the complexity of a library to a manageable format compared with completely random libraries. Computational algorithms render feasible the calculation of appropriate nucleotide mixtures to encode specified amino acid subpopulations. The crucial element in the ranking of spiked codons generated during an iterative algorithm is the scoring function. In this report three scoring functions are analysed: the sum-of-square-differences function s, a modified cubic function c, and a scoring function m derived from maximum likelihood considerations. The impact of these scoring functions on calculated amino acid distributions is demonstrated by an example of mutagenising a domain surrounding the active site serine of subtilisin-like proteases. At default weight settings of one for each amino acid, the new scoring function m is superior to functions s and c in finding matches to a given amino acid population.
Collapse
Affiliation(s)
- L J Jensen
- Department of Enzyme Design, Novo Nordisk A/S, DK-2880 Bagsvaerd, Denmark
| | | | | | | |
Collapse
|