1
|
Paget‐Bailly P, Helpiquet A, Decourcelle M, Bories R, Bravo IG. Translation of the downstream ORF from bicistronic mRNAs by human cells: Impact of codon usage and splicing in the upstream ORF. Protein Sci 2025; 34:e70036. [PMID: 39840808 PMCID: PMC11751868 DOI: 10.1002/pro.70036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 11/19/2024] [Accepted: 01/03/2025] [Indexed: 01/23/2025]
Abstract
Biochemistry textbooks describe eukaryotic mRNAs as monocistronic. However, increasing evidence reveals the widespread presence and translation of upstream open reading frames preceding the "main" ORF. DNA and RNA viruses infecting eukaryotes often produce polycistronic mRNAs and viruses have evolved multiple ways of manipulating the host's translation machinery. Here, we introduce an experimental model to study gene expression regulation from virus-like bicistronic mRNAs in human cells. The model consists of a short upstream ORF and a reporter downstream ORF encoding a fluorescent protein. We have engineered synonymous variants of the upstream ORF to explore large parameter space, including codon usage preferences, mRNA folding features, and splicing propensity. We show that human translation machinery can translate the downstream ORF from bicistronic mRNAs, albeit reporter protein levels are thousand times lower than those from the upstream ORF. Furthermore, synonymous recoding of the upstream ORF exclusively during elongation significantly influences its own translation efficiency, reveals cryptic splice signals, and modulates the probability of downstream ORF translation. Our results are consistent with a leaky scanning mechanism facilitating downstream ORF translation from bicistronic mRNAs in human cells, offering new insights into the role of upstream ORFs in translation regulation.
Collapse
Affiliation(s)
- Philippe Paget‐Bailly
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| | - Alexandre Helpiquet
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| | - Mathilde Decourcelle
- Functional Proteomics PlatformBioCampus Montpellier (University of Montpellier, CNRS, INSERM)MontpellierFrance
| | - Roxane Bories
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| | - Ignacio G. Bravo
- Laboratory MIVEGEC (Univ. Montpellier, CNRS, IRD)French National Center for Scientific Research (CNRS)MontpellierFrance
| |
Collapse
|
2
|
Lu M, Wan W, Li Y, Li H, Sun B, Yu K, Zhao J, Franzo G, Su S. Codon usage bias analysis of the spike protein of human coronavirus 229E and its host adaptability. Int J Biol Macromol 2023; 253:127319. [PMID: 37820917 DOI: 10.1016/j.ijbiomac.2023.127319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 09/28/2023] [Accepted: 10/06/2023] [Indexed: 10/13/2023]
Abstract
Human coronavirus 229E (HCoV-229E) represents one of the known coronaviruses capable of infecting humans and causes mild respiratory symptoms. It is also considered to have a zoonotic source, originating from animals and being transmitted the humans. In this study, a comprehensive phylogenetic and codon usage analysis of the spike (S) gene of HCoV-229E was conducted. Utilizing phylogenetic analysis and principal component analysis, HCoV-229E was categorized into four distinct clusters, each demonstrating unique host affiliations. Furthermore, it was observed that the codon usage bias within the S gene of HCoV-229E is relatively low, primarily influenced by natural selection patterns, with contributions from mutation pressure and dinucleotide abundance. Comparative analysis involving Codon Adaptation Index (CAI) and Relative Codon Deoptimization Index (RCDI) revealed that the codon usage pattern of HCoV-229E mirrors more closely that of camels, as opposed to alpacas and humans. The elucidation of the codon usage pattern within HCoV-229E, which we have meticulously examined, offers valuable insights for a more comprehensive comprehension of viral features, history, and evolutionary trajectory.
Collapse
Affiliation(s)
- Meng Lu
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Wenbo Wan
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Yuxing Li
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Haipeng Li
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Bowen Sun
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Kang Yu
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Jin Zhao
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China
| | - Giovanni Franzo
- Department of Animal Medicine, Production and Health (MAPS), University of Padua, Viale dell'Università 16, Legnaro 35020, PD, Italy
| | - Shuo Su
- Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, 131 Dong'an Road, Shanghai 200032, People's Republic of China.
| |
Collapse
|
3
|
Molteni C, Forni D, Cagliani R, Bravo IG, Sironi M. Evolution and diversity of nucleotide and dinucleotide composition in poxviruses. J Gen Virol 2023; 104. [PMID: 37792576 DOI: 10.1099/jgv.0.001897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023] Open
Abstract
Poxviruses (family Poxviridae) have long dsDNA genomes and infect a wide range of hosts, including insects, birds, reptiles and mammals. These viruses have substantial incidence, prevalence and disease burden in humans and in other animals. Nucleotide and dinucleotide composition, mostly CpG and TpA, have been largely studied in viral genomes because of their evolutionary and functional implications. We analysed here the nucleotide and dinucleotide composition, as well as codon usage bias, of a set of representative poxvirus genomes, with a very diverse host spectrum. After correcting for overall nucleotide composition, entomopoxviruses displayed low overall GC content, no enrichment in TpA and large variation in CpG enrichment, while chordopoxviruses showed large variation in nucleotide composition, no obvious depletion in CpG and a weak trend for TpA depletion in GC-rich genomes. Overall, intergenome variation in dinucleotide composition in poxviruses is largely accounted for by variation in overall genomic GC levels. Nonetheless, using vaccinia virus as a model, we found that genes expressed at the earliest times in infection are more CpG-depleted than genes expressed at later stages. This observation has parallels in betahepesviruses (also large dsDNA viruses) and suggests an antiviral role for the innate immune system (e.g. via the zinc-finger antiviral protein ZAP) in the early phases of poxvirus infection. We also analysed codon usage bias in poxviruses and we observed that it is mostly determined by genomic GC content, and that stratification after host taxonomy does not contribute to explaining codon usage bias diversity. By analysis of within-species diversity, we show that genomic GC content is the result of mutational biases. Poxvirus genomes that encode a DNA ligase are significantly AT-richer than those that do not, suggesting that DNA repair systems shape mutation biases. Our data shed light on the evolution of poxviruses and inform strategies for their genetic manipulation for therapeutic purposes.
Collapse
Affiliation(s)
- Cristian Molteni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Ignacio G Bravo
- Laboratoire MIVEGEC (Univ Montpellier CNRS, IRD), Centre National de la Recherche Scientifique, Montpellier, France
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|
4
|
Bourret J, Borvető F, Bravo IG. Subfunctionalisation of paralogous genes and evolution of differential codon usage preferences: The showcase of polypyrimidine tract binding proteins. J Evol Biol 2023; 36:1375-1392. [PMID: 37667674 DOI: 10.1111/jeb.14212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 09/06/2023]
Abstract
Gene paralogs are copies of an ancestral gene that appear after gene or full genome duplication. When two sister gene copies are maintained in the genome, redundancy may release certain evolutionary pressures, allowing one of them to access novel functions. Here, we focused our study on gene paralogs on the evolutionary history of the three polypyrimidine tract binding protein genes (PTBP) and their concurrent evolution of differential codon usage preferences (CUPrefs) in vertebrate species. PTBP1-3 show high identity at the amino acid level (up to 80%) but display strongly different nucleotide composition, divergent CUPrefs and, in humans and in many other vertebrates, distinct tissue-specific expression levels. Our phylogenetic inference results show that the duplication events leading to the three extant PTBP1-3 lineages predate the basal diversification within vertebrates, and genomic context analysis illustrates that local synteny has been well preserved over time for the three paralogs. We identify a distinct evolutionary pattern towards GC3-enriching substitutions in PTBP1, concurrent with enrichment in frequently used codons and with a tissue-wide expression. In contrast, PTBP2s are enriched in AT-ending, rare codons, and display tissue-restricted expression. As a result of this substitution trend, CUPrefs sharply differ between mammalian PTBP1s and the rest of PTBPs. Genomic context analysis suggests that GC3-rich nucleotide composition in PTBP1s is driven by local substitution processes, while the evidence in this direction is thinner for PTBP2-3. An actual lack of co-variation between the observed GC composition of PTBP2-3 and that of the surrounding non-coding genomic environment would raise an interrogation on the origin of CUPrefs, warranting further research on a putative tissue-specific translational selection. Finally, we communicate an intriguing trend for the use of the UUG-Leu codon, which matches the trends of AT-ending codons. Our results are compatible with a scenario in which a combination of directional mutation-selection processes would have differentially shaped CUPrefs of PTBPs in vertebrates: the observed GC-enrichment of PTBP1 in placental mammals may be linked to genomic location and to the strong and broad tissue-expression, while AT-enrichment of PTBP2 and PTBP3 would be associated with rare CUPrefs and thus, possibly to specialized spatio-temporal expression. Our interpretation is coherent with a gene subfunctionalisation process by differential expression regulation associated with the evolution of specific CUPrefs.
Collapse
Affiliation(s)
- Jérôme Bourret
- Laboratoire MIVEGEC (CNRS IRD Univ Montpellier), Centre National de la Recherche Scientifique (CNRS), Montpellier, France
| | - Fanni Borvető
- Laboratoire MIVEGEC (CNRS IRD Univ Montpellier), Centre National de la Recherche Scientifique (CNRS), Montpellier, France
| | - Ignacio G Bravo
- Laboratoire MIVEGEC (CNRS IRD Univ Montpellier), Centre National de la Recherche Scientifique (CNRS), Montpellier, France
| |
Collapse
|
5
|
Saldivar-Espinoza B, Macip G, Garcia-Segura P, Mestres-Truyol J, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. Prediction of Recurrent Mutations in SARS-CoV-2 Using Artificial Neural Networks. Int J Mol Sci 2022; 23:ijms232314683. [PMID: 36499005 PMCID: PMC9736107 DOI: 10.3390/ijms232314683] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/18/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022] Open
Abstract
Predicting SARS-CoV-2 mutations is difficult, but predicting recurrent mutations driven by the host, such as those caused by host deaminases, is feasible. We used machine learning to predict which positions from the SARS-CoV-2 genome will hold a recurrent mutation and which mutations will be the most recurrent. We used data from April 2021 that we separated into three sets: a training set, a validation set, and an independent test set. For the test set, we obtained a specificity value of 0.69, a sensitivity value of 0.79, and an Area Under the Curve (AUC) of 0.8, showing that the prediction of recurrent SARS-CoV-2 mutations is feasible. Subsequently, we compared our predictions with updated data from January 2022, showing that some of the false positives in our prediction model become true positives later on. The most important variables detected by the model's Shapley Additive exPlanation (SHAP) are the nucleotide that mutates and RNA reactivity. This is consistent with the SARS-CoV-2 mutational bias pattern and the preference of some host deaminases for specific sequences and RNA secondary structures. We extend our investigation by analyzing the mutations from the variants of concern Alpha, Beta, Delta, Gamma, and Omicron. Finally, we analyzed amino acid changes by looking at the predicted recurrent mutations in the M-pro and spike proteins.
Collapse
Affiliation(s)
- Bryan Saldivar-Espinoza
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Guillem Macip
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Pol Garcia-Segura
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Júlia Mestres-Truyol
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Pere Puigbò
- Department of Biology, University of Turku, 20500 Turku, Finland
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, 43007 Tarragona, Spain
- Nutrition and Health Unit, Eurecat Technology Centre of Catalonia, 43204 Reus, Spain
| | - Adrià Cereto-Massagué
- EURECAT Centre Tecnològic de Catalunya, Centre for Omic Sciences (COS), Joint Unit Universitat Rovira i Virgili-EURECAT, Unique Scientific and Technical Infrastructures (ICTS), 43204 Reus, Spain
| | - Gerard Pujadas
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Santiago Garcia-Vallve
- Research Group in Cheminformatics & Nutrition, Departament de Bioquímica i Biotecnologia, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
- Correspondence:
| |
Collapse
|
6
|
Hassanin A. Variation in synonymous nucleotide composition among genomes of sarbecoviruses and consequences for the origin of COVID-19. Gene X 2022; 835:146641. [PMID: 35700806 PMCID: PMC9200079 DOI: 10.1016/j.gene.2022.146641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 05/19/2022] [Accepted: 06/02/2022] [Indexed: 11/26/2022] Open
Abstract
The subgenus Sarbecovirus includes two human viruses, SARS-CoV and SARS-CoV-2, respectively responsible for the SARS epidemic and COVID-19 pandemic, as well as many bat viruses and two pangolin viruses. Here, the synonymous nucleotide composition (SNC) of Sarbecovirus genomes was analysed by examining third codon-positions, dinucleotides, and degenerate codons. The results show evidence for the eight following groups: (i) SARS-CoV related coronaviruses (SCoVrC including many bat viruses from China), (ii) SARS-CoV-2 related coronaviruses (SCoV2rC; including five bat viruses from Cambodia, Thailand and Yunnan), (iii) pangolin sarbecoviruses, (iv) three bat sarbecoviruses showing evidence of recombination between SCoVrC and SCoV2rC genomes, (v) two highly divergent bat sarbecoviruses from Yunnan, (vi) the bat sarbecovirus from Japan, (vii) the bat sarbecovirus from Bulgaria, and (viii) the bat sarbecovirus from Kenya. All these groups can be diagnosed by specific nucleotide compositional features except the one concerned by recombination between SCoVrC and SCoV2rC. In particular, SCoV2rC genomes have less cytosines and more uracils at third codon-positions than other sarbecoviruses, whereas the genomes of pangolin sarbecoviruses show more adenines at third codon-positions. I suggest that taxonomic differences in the imbalanced nucleotide pools available in host cells during viral replication can explain the eight groups of SNC here detected among Sarbecovirus genomes. A related effect due to hibernating bats and their latitudinal distribution is also discussed. I conclude that the two independent host switches from Rhinolophus bats to pangolins resulted in convergent mutational constraints and that SARS-CoV-2 emerged directly from a horseshoe bat sarbecovirus.
Collapse
Affiliation(s)
- Alexandre Hassanin
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, EPHE, MNHN, UA, Paris, France.
| |
Collapse
|