1
|
Burghardt M, Tuller T. Modeling coding sequence design for virus-based expression in tobacco. Synth Syst Biotechnol 2025; 10:337-345. [PMID: 39802156 PMCID: PMC11718241 DOI: 10.1016/j.synbio.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 12/07/2024] [Accepted: 12/09/2024] [Indexed: 01/16/2025] Open
Abstract
Transient expression in Tobacco is a popular way to produce recombinant proteins in plants. The design of various expression vectors, delivered into the plant by Agrobacterium, has enabled high production levels of some proteins. To further enhance expression, researchers often adapt the coding sequence of heterologous genes to the host, but this strategy has produced mixed results in Tobacco. To study the effects of different sequence features on protein yield, we compile a dataset of the yields and coding sequences of previously published expression studies of more than 200 coding sequences. We evaluate various established gene expression models on a subset of the expression studies. We find that use of tobacco codons is only moderately predictive of protein yield as informative sequence features likely extend over multiple codons. Additionally, we show that codon usage of organisms that use tobacco as a host for expression of their proteins in a similar way as the synthetic system, like viruses and agrobacteria, can be used to predict heterologous expression. Other predictive features are related to tRNA supply and demand, the inclusion of a translational ramp of codons with lower adaptation to the tRNA pool at the beginning of the coding region, and the amino acid composition of the recombinant protein. A model based on all the features achieved a correlation of 0.57 with protein yield. We believe that our study provides a practical guideline for coding sequence design for efficient expression in tobacco.
Collapse
Affiliation(s)
- Moritz Burghardt
- Department of Biomedical Engineering, The Iby and Aladar Fleischman Faculty of Engineering, Tel Aviv, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, The Iby and Aladar Fleischman Faculty of Engineering, Tel Aviv, Israel
- The Segol School of Neuroscience, Tel-Aviv University, Tel Aviv, Israel
- Center for Physics and Chemistry of Living Systems, Israel
| |
Collapse
|
2
|
Demissie EA, Park SY, Moon JH, Lee DY. Comparative Analysis of Codon Optimization Tools: Advancing toward a Multi-Criteria Framework for Synthetic Gene Design. J Microbiol Biotechnol 2025; 35:e2411066. [PMID: 40223268 PMCID: PMC12010093 DOI: 10.4014/jmb.2411.11066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 02/13/2025] [Accepted: 02/24/2025] [Indexed: 04/15/2025]
Abstract
Codon optimization is an essential technique in synthetic biology and biopharmaceutical production, enhancing recombinant protein expression by fine-tuning genetic sequences to match the translational machinery and codon usage preferences of specific host organisms. This study presents a comprehensive comparative analysis of widely used codon optimization tools, focusing on their capacity to reflect host-specific codon biases, design principles, and parameters. Industrially relevant target proteins were evaluated in Escherichia coli, Saccharomyces cerevisiae, and CHO cells, uncovering significant variability in sequence design and clustering patterns across tools. Tools such as JCat, OPTIMIZER, ATGme, and GeneOptimizer demonstrated strong alignment with genome-wide and highly expressed gene-level codon usage, achieving high codon adaptation index (CAI) values and efficient codon-pair utilization. Conversely, tools like TISIGNER and IDT employed different optimization strategies that frequently produced divergent results. Other key parameters, including GC content, mRNA secondary structure stability (ΔG), and codon-pair bias (CPB), were analyzed to elucidate their influence on translational efficiency. While increased GC content enhanced mRNA stability in E. coli, A/T-rich codons in S. cerevisiae minimized secondary structure formation, and moderate GC content in CHO cells balanced mRNA stability and translation efficiency. Our findings highlight the limitations of single-metric approaches and advocate for a multi-criteria framework that integrates CAI, GC content, mRNA folding energy, and codon-pair considerations. This integrative strategy enables the design of tailored genetic sequences that meet host-specific requirements, advancing synthetic gene design for biotechnological innovation and precision biopharmaceutical applications.
Collapse
Affiliation(s)
- Eden A. Demissie
- School of Chemical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Seo-Young Park
- School of Chemical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
- School of Medicine, Kyungpook National University, Daegu 41944, Republic of Korea
| | - Je Hun Moon
- School of Chemical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Dong-Yup Lee
- School of Chemical Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| |
Collapse
|
3
|
Fallahpour A, Gureghian V, Filion GJ, Lindner AB, Pandi A. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Nat Commun 2025; 16:3205. [PMID: 40180930 PMCID: PMC11968976 DOI: 10.1038/s41467-025-58588-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 03/24/2025] [Indexed: 04/05/2025] Open
Abstract
Degeneracy in the genetic code allows many possible DNA sequences to encode the same protein. Optimizing codon usage within a sequence to meet organism-specific preferences faces combinatorial explosion. Nevertheless, natural sequences optimized through evolution provide a rich source of data for machine learning algorithms to explore the underlying rules. Here, we introduce CodonTransformer, a multispecies deep learning model trained on over 1 million DNA-protein pairs from 164 organisms spanning all domains of life. The model demonstrates context-awareness thanks to its Transformers architecture and to our sequence representation strategy that combines organism, amino acid, and codon encodings. CodonTransformer generates host-specific DNA sequences with natural-like codon distribution profiles and with minimum negative cis-regulatory elements. This work introduces the strategy of Shared Token Representation and Encoding with Aligned Multi-masking (STREAM) and provides a codon optimization framework with a customizable open-access model and a user-friendly Google Colab interface.
Collapse
Affiliation(s)
- Adibvafa Fallahpour
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- University of Toronto Scarborough; Department of Biological Science, Scarborough, ON, Canada
| | - Vincent Gureghian
- Sorbonne Université, CNRS, ERL U1338 Inserm, Department of Computational, Quantitative and Synthetic Biology, Paris, France
- Sorbonne Université, CNRS, Inserm, Institut de Biologie Paris-Seine, Paris, France
| | - Guillaume J Filion
- University of Toronto Scarborough; Department of Biological Science, Scarborough, ON, Canada.
| | - Ariel B Lindner
- Sorbonne Université, CNRS, ERL U1338 Inserm, Department of Computational, Quantitative and Synthetic Biology, Paris, France.
- Sorbonne Université, CNRS, Inserm, Institut de Biologie Paris-Seine, Paris, France.
- Sorbonne Université, CNRS, Université de Technologie de Compiègne, Inserm, Biofoundry Alliance Sorbonne Université, Paris, France.
| | - Amir Pandi
- Sorbonne Université, CNRS, ERL U1338 Inserm, Department of Computational, Quantitative and Synthetic Biology, Paris, France.
- Sorbonne Université, CNRS, Inserm, Institut de Biologie Paris-Seine, Paris, France.
- Sorbonne Université, CNRS, Université de Technologie de Compiègne, Inserm, Biofoundry Alliance Sorbonne Université, Paris, France.
| |
Collapse
|
4
|
Siegall WB, Lyon RB, Kelman Z. An important consideration when expressing mAbs in Escherichiacoli. Protein Expr Purif 2024; 220:106499. [PMID: 38703798 DOI: 10.1016/j.pep.2024.106499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 04/19/2024] [Accepted: 05/02/2024] [Indexed: 05/06/2024]
Abstract
Monoclonal antibodies (mAbs) are a driving force in the biopharmaceutical industry. Therapeutic mAbs are usually produced in mammalian cells, but there has been a push towards the use of alternative production hosts, such as Escherichia coli. When the genes encoding for a mAb heavy and light chains are codon-optimized for E. coli expression, a truncated form of the heavy chain can form along with the full-length product. In this work, the role of codon optimization in the formation of a truncated product was investigated. This study used the amino acid sequences of several therapeutic mAbs and multiple optimization algorithms. It was found that several algorithms incorporate sequences that lead to a truncated product. Approaches to avoid this truncated form are discussed.
Collapse
Affiliation(s)
- William B Siegall
- Institute for Bioscience and Biotechnology Research (IBBR), The University of Maryland (UMD), 9600 Gudelsky Drive, Rockville, MD, 20850, USA
| | - Rachel B Lyon
- Institute for Bioscience and Biotechnology Research (IBBR), The University of Maryland (UMD), 9600 Gudelsky Drive, Rockville, MD, 20850, USA; Biomolecular Labeling Laboratory, IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850, USA
| | - Zvi Kelman
- Institute for Bioscience and Biotechnology Research (IBBR), The University of Maryland (UMD), 9600 Gudelsky Drive, Rockville, MD, 20850, USA; National Institute of Standards and Technology (NIST), 9600 Gudelsky Drive, Rockville, MD, 20850, USA; Biomolecular Labeling Laboratory, IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850, USA.
| |
Collapse
|
5
|
Mukherjee S, Douglas N, Jimenez R. Influence of Fluorescence Lifetime Selections and Conformational Flexibility on Brightness of FusionRed Variants. J Phys Chem Lett 2024; 15:1644-1651. [PMID: 38315162 DOI: 10.1021/acs.jpclett.3c02765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
Fluorescent proteins (FPs) for bioimaging are typically developed by screening mutant libraries for clones with improved photophysical properties. This approach has resulted in FPs with high brightness, but the mechanistic origins of the improvements are often unclear. We focused on improving the molecular brightness in the FusionRed family of FPs with fluorescence lifetime selections on targeted libraries, with the aim of reducing nonradiative decay rates. Our new variants show fluorescence quantum yields of up to 75% and lifetimes >3.5 ns. We present a comprehensive analysis of these new FPs, including trends in spectral shifts, photophysical data, photostability, and cellular brightness resulting from codon optimization. We also performed all-atom molecular dynamics simulations to investigate the impact of side chain mutations. The trajectories reveal that individual mutations reduce the flexibility of the chromophore and side chains, leading to an overall reduction in nonradiative rates.
Collapse
Affiliation(s)
- Srijit Mukherjee
- JILA, University of Colorado, Boulder, and National Institute of Standards and Technology, 440 UCB, Boulder, Colorado 80309, United States
- Department of Chemistry, University of Colorado, Boulder, 215 UCB, Boulder, Colorado 80309, United States
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Nancy Douglas
- Department of Chemistry, University of Colorado, Boulder, 215 UCB, Boulder, Colorado 80309, United States
| | - Ralph Jimenez
- JILA, University of Colorado, Boulder, and National Institute of Standards and Technology, 440 UCB, Boulder, Colorado 80309, United States
- Department of Chemistry, University of Colorado, Boulder, 215 UCB, Boulder, Colorado 80309, United States
| |
Collapse
|
6
|
Kramps T. Introduction to RNA Vaccines Post COVID-19. Methods Mol Biol 2024; 2786:1-22. [PMID: 38814388 DOI: 10.1007/978-1-0716-3770-8_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
Available prophylactic vaccines help prevent many infectious diseases that burden humanity. Future vaccinology will likely extend these benefits by more effectively countering newly emerging pathogens, fighting currently intractable infections, or even generating novel treatment modalities for non-infectious diseases. Instead of applying protein antigen directly, RNA vaccines contain short-lived genetic information that guides the expression of protein antigen in the vaccinee, like infection with a recombinant viral vector. Upon decades of research, messenger RNA-lipid nanoparticle (mRNA-LNP) vaccines have proven clinical value in addressing the COVID-19 pandemic as they combine benefits of killed subunit vaccines and live-attenuated vectors, including flexible production, self-adjuvanting effects, and stimulation of humoral and cellular immunity. RNA vaccines remain subject to continued development raising high hopes for broader future application. Their mechanistic versatility promises to make them a key tool of vaccinology and immunotherapy going forward. Here, I briefly review key developments in RNA vaccines and outline the contents of this volume of Methods in Molecular Biology.
Collapse
|
7
|
Willems T, Hectors W, Rombaut J, De Rop AS, Goegebeur S, Delmulle T, De Mol ML, De Maeseneire SL, Soetaert WK. An exploratory in silico comparison of open-source codon harmonization tools. Microb Cell Fact 2023; 22:227. [PMID: 37932726 PMCID: PMC10626681 DOI: 10.1186/s12934-023-02230-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/14/2023] [Indexed: 11/08/2023] Open
Abstract
BACKGROUND Not changing the native constitution of genes prior to their expression by a heterologous host can affect the amount of proteins synthesized as well as their folding, hampering their activity and even cell viability. Over the past decades, several strategies have been developed to optimize the translation of heterologous genes by accommodating the difference in codon usage between species. While there have been a handful of studies assessing various codon optimization strategies, to the best of our knowledge, no research has been performed towards the evaluation and comparison of codon harmonization algorithms. To highlight their importance and encourage meaningful discussion, we compared different open-source codon harmonization tools pertaining to their in silico performance, and we investigated the influence of different gene-specific factors. RESULTS In total, 27 genes were harmonized with four tools toward two different heterologous hosts. The difference in %MinMax values between the harmonized and the original sequences was calculated (ΔMinMax), and statistical analysis of the obtained results was carried out. It became clear that not all tools perform similarly, and the choice of tool should depend on the intended application. Almost all biological factors under investigation (GC content, RNA secondary structures and choice of heterologous host) had a significant influence on the harmonization results and thus must be taken into account. These findings were substantiated using a validation dataset consisting of 8 strategically chosen genes. CONCLUSIONS Due to the size of the dataset, no complex models could be developed. However, this initial study showcases significant differences between the results of various codon harmonization tools. Although more elaborate investigation is needed, it is clear that biological factors such as GC content, RNA secondary structures and heterologous hosts must be taken into account when selecting the codon harmonization tool.
Collapse
Affiliation(s)
- Thomas Willems
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Wim Hectors
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Jeltien Rombaut
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Anne-Sofie De Rop
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Stijn Goegebeur
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Tom Delmulle
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Maarten L De Mol
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Sofie L De Maeseneire
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium.
| | - Wim K Soetaert
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| |
Collapse
|
8
|
Lewin LE, Daniels KG, Hurst LD. Genes for highly abundant proteins in Escherichia coli avoid 5' codons that promote ribosomal initiation. PLoS Comput Biol 2023; 19:e1011581. [PMID: 37878567 PMCID: PMC10599525 DOI: 10.1371/journal.pcbi.1011581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/09/2023] [Indexed: 10/27/2023] Open
Abstract
In many species highly expressed genes (HEGs) over-employ the synonymous codons that match the more abundant iso-acceptor tRNAs. Bacterial transgene codon randomization experiments report, however, that enrichment with such "translationally optimal" codons has little to no effect on the resultant protein level. By contrast, consistent with the view that ribosomal initiation is rate limiting, synonymous codon usage following the 5' ATG greatly influences protein levels, at least in part by modifying RNA stability. For the design of bacterial transgenes, for simple codon based in silico inference of protein levels and for understanding selection on synonymous mutations, it would be valuable to computationally determine initiation optimality (IO) scores for codons for any given species. One attractive approach is to characterize the 5' codon enrichment of HEGs compared with the most lowly expressed genes, just as translational optimality scores of codons have been similarly defined employing the full gene body. Here we determine the viability of this approach employing a unique opportunity: for Escherichia coli there is both the most extensive protein abundance data for native genes and a unique large-scale transgene codon randomization experiment enabling objective definition of the 5' codons that cause, rather than just correlate with, high protein abundance (that we equate with initiation optimality, broadly defined). Surprisingly, the 5' ends of native genes that specify highly abundant proteins avoid such initiation optimal codons. We find that this is probably owing to conflicting selection pressures particular to native HEGs, including selection favouring low initiation rates, this potentially enabling high efficiency of ribosomal usage and low noise. While the classical HEG enrichment approach does not work, rendering simple prediction of native protein abundance from 5' codon content futile, we report evidence that initiation optimality scores derived from the transgene experiment may hold relevance for in silico transgene design for a broad spectrum of bacteria.
Collapse
Affiliation(s)
- Loveday E. Lewin
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Kate G. Daniels
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Laurence D. Hurst
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| |
Collapse
|
9
|
Imon RR, Samad A, Alam R, Alsaiari AA, Talukder MEK, Almehmadi M, Ahammad F, Mohammad F. Computational formulation of a multiepitope vaccine unveils an exceptional prophylactic candidate against Merkel cell polyomavirus. Front Immunol 2023; 14:1160260. [PMID: 37441076 PMCID: PMC10333698 DOI: 10.3389/fimmu.2023.1160260] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 05/30/2023] [Indexed: 07/15/2023] Open
Abstract
Merkel cell carcinoma (MCC) is a rare neuroendocrine skin malignancy caused by human Merkel cell polyomavirus (MCV), leading to the most aggressive skin cancer in humans. MCV has been identified in approximately 43%-100% of MCC cases, contributing to the highly aggressive nature of primary cutaneous carcinoma and leading to a notable mortality rate. Currently, no existing vaccines or drug candidates have shown efficacy in addressing the ailment caused by this specific pathogen. Therefore, this study aimed to design a novel multiepitope vaccine candidate against the virus using integrated immunoinformatics and vaccinomics approaches. Initially, the highest antigenic, immunogenic, and non-allergenic epitopes of cytotoxic T lymphocytes, helper T lymphocytes, and linear B lymphocytes corresponding to the virus whole protein sequences were identified and retrieved for vaccine construction. Subsequently, the selected epitopes were linked with appropriate linkers and added an adjuvant in front of the construct to enhance the immunogenicity of the vaccine candidates. Additionally, molecular docking and dynamics simulations identified strong and stable binding interactions between vaccine candidates and human Toll-like receptor 4. Furthermore, computer-aided immune simulation found the real-life-like immune response of vaccine candidates upon administration to the human body. Finally, codon optimization was conducted on the vaccine candidates to facilitate the in silico cloning of the vaccine into the pET28+(a) cloning vector. In conclusion, the vaccine candidate developed in this study is anticipated to augment the immune response in humans and effectively combat the virus. Nevertheless, it is imperative to conduct in vitro and in vivo assays to evaluate the efficacy of these vaccine candidates thoroughly. These evaluations will provide critical insights into the vaccine's effectiveness and potential for further development.
Collapse
Affiliation(s)
- Raihan Rahman Imon
- Laboratory of Computational Biology, Biological Solution Centre (BioSol Centre), Jashore, Bangladesh
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Abdus Samad
- Laboratory of Computational Biology, Biological Solution Centre (BioSol Centre), Jashore, Bangladesh
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Rahat Alam
- Laboratory of Computational Biology, Biological Solution Centre (BioSol Centre), Jashore, Bangladesh
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Ahad Amer Alsaiari
- Clinical Laboratories Science Department, College of Applied Medical Science, Taif University, Taif, Saudi Arabia
| | - Md. Enamul Kabir Talukder
- Laboratory of Computational Biology, Biological Solution Centre (BioSol Centre), Jashore, Bangladesh
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Mazen Almehmadi
- Clinical Laboratories Science Department, College of Applied Medical Science, Taif University, Taif, Saudi Arabia
| | - Foysal Ahammad
- Laboratory of Computational Biology, Biological Solution Centre (BioSol Centre), Jashore, Bangladesh
- Division of Biological and Biomedical Sciences (BBS), College of Health and Life Sciences (CHLS), Hamad Bin Khalifa University (HBKU), Doha, Qatar
| | - Farhan Mohammad
- Division of Biological and Biomedical Sciences (BBS), College of Health and Life Sciences (CHLS), Hamad Bin Khalifa University (HBKU), Doha, Qatar
| |
Collapse
|
10
|
Hernandez-Alias X, Benisty H, Radusky LG, Serrano L, Schaefer MH. Using protein-per-mRNA differences among human tissues in codon optimization. Genome Biol 2023; 24:34. [PMID: 36829202 PMCID: PMC9951436 DOI: 10.1186/s13059-023-02868-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 02/07/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Codon usage and nucleotide composition of coding sequences have profound effects on protein expression. However, while it is recognized that different tissues have distinct tRNA profiles and codon usages in their transcriptomes, the effect of tissue-specific codon optimality on protein synthesis remains elusive. RESULTS We leverage existing state-of-the-art transcriptomics and proteomics datasets from the GTEx project and the Human Protein Atlas to compute the protein-to-mRNA ratios of 36 human tissues. Using this as a proxy of translational efficiency, we build a machine learning model that identifies codons enriched or depleted in specific tissues. We detect two clusters of tissues with an opposite pattern of codon preferences. We then use these identified patterns for the development of CUSTOM, a codon optimizer algorithm which suggests a synonymous codon design in order to optimize protein production in a tissue-specific manner. In human cell-line models, we provide evidence that codon optimization should take into account particularities of the translational machinery of the tissues in which the target proteins are expressed and that our approach can design genes with tissue-optimized expression profiles. CONCLUSIONS We provide proof-of-concept evidence that codon preferences exist in tissue-specific protein synthesis and demonstrate its application to synthetic gene design. We show that CUSTOM can be of benefit in biological and biotechnological applications, such as in the design of tissue-targeted therapies and vaccines.
Collapse
Affiliation(s)
- Xavier Hernandez-Alias
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain.
| | - Hannah Benisty
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Leandro G Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), 08002, Barcelona, Spain. .,ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain.
| | - Martin H Schaefer
- IEO European Institute of Oncology IRCCS, Department of Experimental Oncology, Via Adamello 16, 20139, Milan, Italy.
| |
Collapse
|
11
|
Arnesen JA, Belmonte Del Ama A, Jayachandran S, Dahlin J, Rago D, Andersen AJC, Borodina I. Engineering of Yarrowia lipolytica for the production of plant triterpenoids: Asiatic, madecassic, and arjunolic acids. Metab Eng Commun 2022; 14:e00197. [PMID: 35433265 PMCID: PMC9011116 DOI: 10.1016/j.mec.2022.e00197] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/23/2022] [Accepted: 03/24/2022] [Indexed: 12/13/2022] Open
Abstract
Several plant triterpenoids have valuable pharmaceutical properties, but their production and usage is limited since extraction from plants can burden natural resources, and result in low yields and purity. Here, we engineered oleaginous yeast Yarrowia lipolytica to produce three valuable plant triterpenoids (asiatic, madecassic, and arjunolic acids) by fermentation. First, we established the recombinant production of precursors, ursolic and oleanolic acids, by expressing plant enzymes in free or fused versions in a Y. lipolytica strain previously optimized for squalene production. Engineered strains produced up to 11.6 mg/g DCW ursolic acid or 10.2 mg/g DCW oleanolic acid. The biosynthetic pathway from ursolic acid was extended by expressing the Centella asiatica cytochrome P450 monoxygenases CaCYP716C11p, CaCYP714E19p, and CaCYP716E41p, resulting in the production of trace amounts of asiatic acid and 0.12 mg/g DCW madecassic acid. Expressing the same C. asiatica cytochromes P450 in oleanolic acid-producing strain resulted in the production of oleanane triterpenoids. Expression of CaCYP716C11p in the oleanolic acid-producing strain yielded 8.9 mg/g DCW maslinic acid. Further expression of a codon-optimized CaCYP714E19p resulted in 4.4 mg/g DCW arjunolic acid. Lastly, arjunolic acid production was increased to 9.1 mg/g DCW by swapping the N-terminal domain of CaCYP714E19p with the N-terminal domain from a Kalopanax septemlobus cytochrome P450. In summary, we have demonstrated the production of asiatic, madecassic, and arjunolic acids in a microbial cell factory. The strains and fermentation processes need to be further improved before the production of these molecules by fermentation can be industrialized.
Collapse
Affiliation(s)
- Jonathan Asmund Arnesen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800, Kgs. Lyngby, Denmark
| | - Arian Belmonte Del Ama
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800, Kgs. Lyngby, Denmark
| | - Sidharth Jayachandran
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800, Kgs. Lyngby, Denmark
| | - Jonathan Dahlin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800, Kgs. Lyngby, Denmark
| | - Daniela Rago
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800, Kgs. Lyngby, Denmark
| | - Aaron John Christian Andersen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Søltofts plads 221, 2800, Kgs. Lyngby, Denmark
| | - Irina Borodina
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800, Kgs. Lyngby, Denmark
| |
Collapse
|
12
|
Rosenberg AA, Marx A, Bronstein AM. Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon. Nat Commun 2022; 13:2815. [PMID: 35595777 PMCID: PMC9123026 DOI: 10.1038/s41467-022-30390-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 04/28/2022] [Indexed: 12/27/2022] Open
Abstract
Synonymous codons translate into chemically identical amino acids. Once considered inconsequential to the formation of the protein product, there is evidence to suggest that codon usage affects co-translational protein folding and the final structure of the expressed protein. Here we develop a method for computing and comparing codon-specific Ramachandran plots and demonstrate that the backbone dihedral angle distributions of some synonymous codons are distinguishable with statistical significance for some secondary structures. This shows that there exists a dependence between codon identity and backbone torsion of the translated amino acid. Although these findings cannot pinpoint the causal direction of this dependence, we discuss the vast biological implications should coding be shown to directly shape protein conformation and demonstrate the usefulness of this method as a tool for probing associations between codon usage and protein structure. Finally, we urge for the inclusion of exact genetic information into structural databases.
Collapse
Affiliation(s)
- Aviv A Rosenberg
- Computer Science, Technion - Israel Institute of Technology, Haifa, 3200003, Israel
| | - Ailie Marx
- Computer Science, Technion - Israel Institute of Technology, Haifa, 3200003, Israel
| | - Alex M Bronstein
- Computer Science, Technion - Israel Institute of Technology, Haifa, 3200003, Israel.
| |
Collapse
|
13
|
Rossi R, Fang M, Zhu L, Jiang C, Yu C, Flesia C, Nie C, Li W, Ferlini A. Calculating and comparing codon usage values in rare disease genes highlights codon clustering with disease-and tissue- specific hierarchy. PLoS One 2022; 17:e0265469. [PMID: 35358230 PMCID: PMC8970475 DOI: 10.1371/journal.pone.0265469] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/02/2022] [Indexed: 11/19/2022] Open
Abstract
We designed a novel strategy to define codon usage bias (CUB) in 6 specific small cohorts of human genes. We calculated codon usage (CU) values in 29 non-disease-causing (NDC) and 31 disease-causing (DC) human genes which are highly expressed in 3 distinct tissues, kidney, muscle, and skin. We applied our strategy to the same selected genes annotated in 15 mammalian species. We obtained CUB hierarchical clusters for each gene cohort which showed tissue-specific and disease-specific CUB fingerprints. We showed that DC genes (especially those expressed in muscle) display a low CUB, well recognizable in codon hierarchical clustering. We defined the extremely biased codons as "zero codons" and found that their number is significantly higher in all DC genes, all tissues, and that this trend is conserved across mammals. Based on this calculation in different gene cohorts, we identified 5 codons which are more differentially used across genes and mammals, underlining that some genes have favorite synonymous codons in use. Since of the muscle genes clear clusters, and, among these, dystrophin gene surprisingly does not show any "zero codon" we adopted a novel approach to study CUB, we called "mapping-on-codons". We positioned 2828 dystrophin missense and nonsense pathogenic variations on their respective codon, highlighting that its frequency and occurrence is not dependent on the CU values. We conclude our strategy consents to identify a hierarchical clustering of CU values in a gene cohort-specific fingerprints, with recognizable trend across mammals. In DC muscle genes also a disease-related fingerprint can be observed, allowing discrimination between DC and NDC genes. We propose that using our strategy which studies CU in specific gene cohorts, as rare disease genes, and tissue specific genes, may provide novel information about the CUB role in human and medical genetics, with implications on synonymous variations interpretation and codon optimization algorithms.
Collapse
Affiliation(s)
- Rachele Rossi
- Unit of Medical Genetics, Department of Medical Sciences, University of Ferrara, Ferrara, Italy
- Dubowitz Neuromuscular Unit, Institute of Child Health, University College London, London, United Kingdom
| | | | - Lin Zhu
- BGI-Shenzhen, Shenzhen, China
- BGI College & Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China
| | | | - Cong Yu
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Cristina Flesia
- Department of Earth and Environment Science, University of Milano-Bicocca, Milano, Italy
| | | | | | - Alessandra Ferlini
- Unit of Medical Genetics, Department of Medical Sciences, University of Ferrara, Ferrara, Italy
- Dubowitz Neuromuscular Unit, Institute of Child Health, University College London, London, United Kingdom
| |
Collapse
|
14
|
Wright G, Rodriguez A, Li J, Milenkovic T, Emrich SJ, Clark PL. CHARMING: Harmonizing synonymous codon usage to replicate a desired codon usage pattern. Protein Sci 2022; 31:221-231. [PMID: 34738275 PMCID: PMC8740841 DOI: 10.1002/pro.4223] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 10/31/2021] [Accepted: 11/02/2021] [Indexed: 01/03/2023]
Abstract
There is a growing appreciation that synonymous codon usage, although historically regarded as phenotypically silent, can instead alter a wide range of mechanisms related to functional protein production, a term we use here to describe the net effect of transcription (mRNA synthesis), mRNA half-life, translation (protein synthesis) and the probability of a protein folding correctly to its active, functional structure. In particular, recent discoveries have highlighted the important role that sub-optimal codons can play in modifying co-translational protein folding. These results have drawn increased attention to the patterns of synonymous codon usage within coding sequences, particularly in light of the discovery that these patterns can be conserved across evolution for homologous proteins. Because synonymous codon usage differs between organisms, for heterologous gene expression it can be desirable to make synonymous codon substitutions to match the codon usage pattern from the original organism in the heterologous expression host. Here we present CHARMING (for Codon HARMonizING), a robust and versatile algorithm to design mRNA sequences for heterologous gene expression and other related codon harmonization tasks. CHARMING can be run as a downloadable Python script or via a web portal at http://www.codons.org.
Collapse
Affiliation(s)
- Gabriel Wright
- Department of Computer Science & EngineeringUniversity of Notre DameNotre DameIndianaUSA,Present address:
Department of Electrical Engineering and Computer ScienceMilwaukee School of EngineeringMilwaukeeWIUSA
| | - Anabel Rodriguez
- Department of Chemistry & BiochemistryUniversity of Notre DameNotre DameIndianaUSA
| | - Jun Li
- Department of Applied and Computational Mathematics & StatisticsUniversity of Notre DameNotre DameIndianaUSA
| | - Tijana Milenkovic
- Department of Computer Science & EngineeringUniversity of Notre DameNotre DameIndianaUSA
| | - Scott J. Emrich
- Department of Electrical Engineering & Computer ScienceUniversity of TennesseeKnoxvilleTennesseeUSA
| | - Patricia L. Clark
- Department of Chemistry & BiochemistryUniversity of Notre DameNotre DameIndianaUSA
| |
Collapse
|
15
|
Cantoia A, Aguilar Lucero D, Ceccarelli EA, Rosano GL. From the notebook to recombinant protein production in Escherichia coli: Design of expression vectors and gene cloning. Methods Enzymol 2021; 659:19-35. [PMID: 34752286 DOI: 10.1016/bs.mie.2021.07.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Research in recombinant protein expression in microorganism hosts spans half a century. The field has evolved from mostly trial-and-error approaches to more rational strategies, including careful design of the expression vectors and the coding sequence for the protein of interest. It is important to reflect on many aspects about vector construction, such as codon usage, integration site, coding sequence mutagenesis and many others. In this chapter, we overview methods and considerations to generate a suitable construct and anticipate possible experimental roadblocks.
Collapse
Affiliation(s)
- Alejo Cantoia
- Instituto de Biología Molecular y Celular de Rosario (IBR), CONICET, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Rosario, Argentina
| | - Dianela Aguilar Lucero
- Instituto de Biología Molecular y Celular de Rosario (IBR), CONICET, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Rosario, Argentina
| | - Eduardo A Ceccarelli
- Instituto de Biología Molecular y Celular de Rosario (IBR), CONICET, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Rosario, Argentina
| | - Germán L Rosano
- Instituto de Biología Molecular y Celular de Rosario (IBR), CONICET, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Rosario, Argentina.
| |
Collapse
|