1
|
Willems T, Hectors W, Rombaut J, De Rop AS, Goegebeur S, Delmulle T, De Mol ML, De Maeseneire SL, Soetaert WK. An exploratory in silico comparison of open-source codon harmonization tools. Microb Cell Fact 2023; 22:227. [PMID: 37932726 PMCID: PMC10626681 DOI: 10.1186/s12934-023-02230-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/14/2023] [Indexed: 11/08/2023] Open
Abstract
BACKGROUND Not changing the native constitution of genes prior to their expression by a heterologous host can affect the amount of proteins synthesized as well as their folding, hampering their activity and even cell viability. Over the past decades, several strategies have been developed to optimize the translation of heterologous genes by accommodating the difference in codon usage between species. While there have been a handful of studies assessing various codon optimization strategies, to the best of our knowledge, no research has been performed towards the evaluation and comparison of codon harmonization algorithms. To highlight their importance and encourage meaningful discussion, we compared different open-source codon harmonization tools pertaining to their in silico performance, and we investigated the influence of different gene-specific factors. RESULTS In total, 27 genes were harmonized with four tools toward two different heterologous hosts. The difference in %MinMax values between the harmonized and the original sequences was calculated (ΔMinMax), and statistical analysis of the obtained results was carried out. It became clear that not all tools perform similarly, and the choice of tool should depend on the intended application. Almost all biological factors under investigation (GC content, RNA secondary structures and choice of heterologous host) had a significant influence on the harmonization results and thus must be taken into account. These findings were substantiated using a validation dataset consisting of 8 strategically chosen genes. CONCLUSIONS Due to the size of the dataset, no complex models could be developed. However, this initial study showcases significant differences between the results of various codon harmonization tools. Although more elaborate investigation is needed, it is clear that biological factors such as GC content, RNA secondary structures and heterologous hosts must be taken into account when selecting the codon harmonization tool.
Collapse
Affiliation(s)
- Thomas Willems
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Wim Hectors
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Jeltien Rombaut
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Anne-Sofie De Rop
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Stijn Goegebeur
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Tom Delmulle
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Maarten L De Mol
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Sofie L De Maeseneire
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium.
| | - Wim K Soetaert
- Centre for Industrial Biotechnology and Biocatalysis (InBio.be), Department of Biotechnology, Faculty of Bioscience Engineering, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| |
Collapse
|
2
|
Karaşan O, Şen A, Tiryaki B, Cicek AE. A unifying network modeling approach for codon optimization. Bioinformatics 2022; 38:3935-3941. [PMID: 35762943 DOI: 10.1093/bioinformatics/btac428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 05/01/2022] [Accepted: 06/27/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Synthesizing genes to be expressed in other organisms is an essential tool in biotechnology. While the many-to-one mapping from codons to amino acids makes the genetic code degenerate, codon usage in a particular organism is not random either. This bias in codon use may have a remarkable effect on the level of gene expression. A number of measures have been developed to quantify a given codon sequence's strength to express a gene in a host organism. Codon optimization aims to find a codon sequence that will optimize one or more of these measures. Efficient computational approaches are needed since the possible number of codon sequences grows exponentially as the number of amino acids increases. RESULTS We develop a unifying modeling approach for codon optimization. With our mathematical formulations based on graph/network representations of amino acid sequences, any combination of measures can be optimized in the same framework by finding a path satisfying additional limitations in an acyclic layered network. We tested our approach on bi-objectives commonly used in the literature, namely, Codon Pair Bias versus Codon Adaptation Index and Relative Codon Pair Bias versus Relative Codon Bias. However, our framework is general enough to handle any number of objectives concurrently with certain restrictions or preferences on the use of specific nucleotide sequences. We implemented our models using Python's Gurobi interface and showed the efficacy of our approach even for the largest proteins available. We also provided experimentation showing that highly expressed genes have objective values close to the optimized values in the bi-objective codon design problem. AVAILABILITY AND IMPLEMENTATION http://alpersen.bilkent.edu.tr/NetworkCodon.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Oya Karaşan
- Department of Industrial Engineering, Bilkent University, Ankara 06800, Turkey
| | - Alper Şen
- Department of Industrial Engineering, Bilkent University, Ankara 06800, Turkey
| | - Banu Tiryaki
- Department of Industrial Engineering, Bilkent University, Ankara 06800, Turkey
| | - A Ercument Cicek
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| |
Collapse
|
3
|
Wright G, Rodriguez A, Li J, Milenkovic T, Emrich SJ, Clark PL. CHARMING: Harmonizing synonymous codon usage to replicate a desired codon usage pattern. Protein Sci 2022; 31:221-231. [PMID: 34738275 PMCID: PMC8740841 DOI: 10.1002/pro.4223] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 10/31/2021] [Accepted: 11/02/2021] [Indexed: 01/03/2023]
Abstract
There is a growing appreciation that synonymous codon usage, although historically regarded as phenotypically silent, can instead alter a wide range of mechanisms related to functional protein production, a term we use here to describe the net effect of transcription (mRNA synthesis), mRNA half-life, translation (protein synthesis) and the probability of a protein folding correctly to its active, functional structure. In particular, recent discoveries have highlighted the important role that sub-optimal codons can play in modifying co-translational protein folding. These results have drawn increased attention to the patterns of synonymous codon usage within coding sequences, particularly in light of the discovery that these patterns can be conserved across evolution for homologous proteins. Because synonymous codon usage differs between organisms, for heterologous gene expression it can be desirable to make synonymous codon substitutions to match the codon usage pattern from the original organism in the heterologous expression host. Here we present CHARMING (for Codon HARMonizING), a robust and versatile algorithm to design mRNA sequences for heterologous gene expression and other related codon harmonization tasks. CHARMING can be run as a downloadable Python script or via a web portal at http://www.codons.org.
Collapse
Affiliation(s)
- Gabriel Wright
- Department of Computer Science & EngineeringUniversity of Notre DameNotre DameIndianaUSA,Present address:
Department of Electrical Engineering and Computer ScienceMilwaukee School of EngineeringMilwaukeeWIUSA
| | - Anabel Rodriguez
- Department of Chemistry & BiochemistryUniversity of Notre DameNotre DameIndianaUSA
| | - Jun Li
- Department of Applied and Computational Mathematics & StatisticsUniversity of Notre DameNotre DameIndianaUSA
| | - Tijana Milenkovic
- Department of Computer Science & EngineeringUniversity of Notre DameNotre DameIndianaUSA
| | - Scott J. Emrich
- Department of Electrical Engineering & Computer ScienceUniversity of TennesseeKnoxvilleTennesseeUSA
| | - Patricia L. Clark
- Department of Chemistry & BiochemistryUniversity of Notre DameNotre DameIndianaUSA
| |
Collapse
|
4
|
Hia F, Takeuchi O. The effects of codon bias and optimality on mRNA and protein regulation. Cell Mol Life Sci 2021; 78:1909-1928. [PMID: 33128106 PMCID: PMC11072601 DOI: 10.1007/s00018-020-03685-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 10/05/2020] [Accepted: 10/12/2020] [Indexed: 12/25/2022]
Abstract
The central dogma of molecular biology entails that genetic information is transferred from nucleic acid to proteins. Notwithstanding retro-transcribing genetic elements, DNA is transcribed to RNA which in turn is translated into proteins. Recent advancements have shown that each stage is regulated to control protein abundances for a variety of essential physiological processes. In this regard, mRNA regulation is essential in fine-tuning or calibrating protein abundances. In this review, we would like to discuss one of several mRNA-intrinsic features of mRNA regulation that has been gaining traction of recent-codon bias and optimality. Specifically, we address the effects of codon bias with regard to codon optimality in several biological processes centred on translation, such as mRNA stability and protein folding among others. Finally, we examine how different organisms or cell types, through this system, are able to coordinate physiological pathways to respond to a variety of stress or growth conditions.
Collapse
Affiliation(s)
- Fabian Hia
- Department of Medical Chemistry, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Osamu Takeuchi
- Department of Medical Chemistry, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
5
|
Newaz K, Wright G, Piland J, Li J, Clark PL, Emrich SJ, Milenković T. Network analysis of synonymous codon usage. Bioinformatics 2020; 36:4876-4884. [PMID: 32609328 DOI: 10.1093/bioinformatics/btaa603] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 05/05/2020] [Accepted: 06/22/2020] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Most amino acids are encoded by multiple synonymous codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding. RESULTS We model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the structural core are network-central, and those on the surface are not. Then, we study potential differences between network centralities and thus structural positions of amino acids encoded by conserved rare, non-conserved rare and commonly used codons. We find that in 84% of proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different codon centrality trends, i.e. different relationships between structural positions of the three codon categories. We see several cases of all proteins from our data with some structural or functional property being in the same group. Also, we see a case of all proteins in some group having the same property. Our work shows that codon usage is linked to the final protein structure and thus possibly to co-translational protein folding. AVAILABILITY AND IMPLEMENTATION https://nd.edu/∼cone/CodonUsage/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Khalique Newaz
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| | - Gabriel Wright
- Department of Computer Science and Engineering.,Eck institute for Global Health
| | - Jacob Piland
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics
| | - Patricia L Clark
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering.,Center for Network and Data Science.,Eck institute for Global Health
| |
Collapse
|
6
|
Basarslan MS, Kayaalp F. Sentiment Analysis with Machine Learning Methods on Social Media. ADCAIJ: ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL 2020. [DOI: 10.14201/adcaij202093515] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Social media has become an important part of our everyday life due to the widespread use of the Internet. Of the social media services, Twitter is among the most used ones around the world. People share their opinions by writing tweets about numerous subjects, such as politics, sports, economy, etc. Millions of tweets per day create a huge dataset, which drew attention of the data scientists to focus on these data for sentiment analysis. The sentiment analysis focuses to identify the social media posts of users about a specific topic and categorize them as positive, negative or neutral. Thus, the study aims to investigate the effect of types of text representation on the performance of sentiment analysis. In this study, two datasets were used in the experiments. The first one is the user reviews about movies from the IMDB, which has been labeled by Kotzias, and the second one is the Twitter tweets, including the tweets of users about health topic in English in 2019, collected using the Twitter API. The Python programming language was used in the study both for implementing the classification models using the Naïve Bayes (NB), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) algorithms, and for categorizing the sentiments as positive, negative and neutral. The feature extraction from the dataset was performed using Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec (W2V) modeling techniques. The success percentages of the classification algorithms were compared at the end. According to the experimental results, Artificial Neural Network had the best accuracy performance in both datasets compared to the others.
Collapse
|