1
|
Wolters JF, LaBella AL, Opulente DA, Rokas A, Hittinger CT. Mitochondrial genome diversity across the subphylum Saccharomycotina. Front Microbiol 2023; 14:1268944. [PMID: 38075892 PMCID: PMC10701893 DOI: 10.3389/fmicb.2023.1268944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/31/2023] [Indexed: 12/20/2023] Open
Abstract
Introduction Eukaryotic life depends on the functional elements encoded by both the nuclear genome and organellar genomes, such as those contained within the mitochondria. The content, size, and structure of the mitochondrial genome varies across organisms with potentially large implications for phenotypic variance and resulting evolutionary trajectories. Among yeasts in the subphylum Saccharomycotina, extensive differences have been observed in various species relative to the model yeast Saccharomyces cerevisiae, but mitochondrial genome sampling across many groups has been scarce, even as hundreds of nuclear genomes have become available. Methods By extracting mitochondrial assemblies from existing short-read genome sequence datasets, we have greatly expanded both the number of available genomes and the coverage across sparsely sampled clades. Results Comparison of 353 yeast mitochondrial genomes revealed that, while size and GC content were fairly consistent across species, those in the genera Metschnikowia and Saccharomyces trended larger, while several species in the order Saccharomycetales, which includes S. cerevisiae, exhibited lower GC content. Extreme examples for both size and GC content were scattered throughout the subphylum. All mitochondrial genomes shared a core set of protein-coding genes for Complexes III, IV, and V, but they varied in the presence or absence of mitochondrially-encoded canonical Complex I genes. We traced the loss of Complex I genes to a major event in the ancestor of the orders Saccharomycetales and Saccharomycodales, but we also observed several independent losses in the orders Phaffomycetales, Pichiales, and Dipodascales. In contrast to prior hypotheses based on smaller-scale datasets, comparison of evolutionary rates in protein-coding genes showed no bias towards elevated rates among aerobically fermenting (Crabtree/Warburg-positive) yeasts. Mitochondrial introns were widely distributed, but they were highly enriched in some groups. The majority of mitochondrial introns were poorly conserved within groups, but several were shared within groups, between groups, and even across taxonomic orders, which is consistent with horizontal gene transfer, likely involving homing endonucleases acting as selfish elements. Discussion As the number of available fungal nuclear genomes continues to expand, the methods described here to retrieve mitochondrial genome sequences from these datasets will prove invaluable to ensuring that studies of fungal mitochondrial genomes keep pace with their nuclear counterparts.
Collapse
Affiliation(s)
- John F. Wolters
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI, United States
| | - Abigail L. LaBella
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States
| | - Dana A. Opulente
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI, United States
- Biology Department, Villanova University, Villanova, PA, United States
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States
| | - Chris Todd Hittinger
- Laboratory of Genetics, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, Center for Genomic Science Innovation, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
2
|
Pfennig A, Lomsadze A, Borodovsky M. MgCod: Gene Prediction in Phage Genomes with Multiple Genetic Codes. J Mol Biol 2023; 435:168159. [PMID: 37244571 DOI: 10.1016/j.jmb.2023.168159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 05/19/2023] [Accepted: 05/21/2023] [Indexed: 05/29/2023]
Abstract
Massive sequencing of microbiomes has led to the discovery of a large number of phage genomes with intermittent stop codon recoding. We have developed a computational tool, MgCod, that identifies genomic regions (blocks) with distinct stop codon recoding simultaneously with the prediction of protein-coding regions. When MgCod was used to scan a large volume of human metagenomic contigs hundreds of viral contigs with intermittent stop codon recoding were revealed. Many of these contigs originated from genomes of known crAssphages. Further analyses had shown that intermittent recoding was associated with subtle patterns in the organization of protein-coding genes, such as 'single-coding' and 'dual-coding'. The dual-coding genes, clustered into blocks, could be translated by two alternative codes producing nearly identical proteins. It was observed that the dual-coded blocks were enriched with the early-stage phage genes, while the late-stage genes were residing in the single-coded blocks. MgCod can identify types of stop codon recoding in novel genomic sequences in parallel with gene prediction. It is available for download from https://github.com/gatech-genemark/MgCod.
Collapse
Affiliation(s)
- Aaron Pfennig
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | - Alexandre Lomsadze
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | - Mark Borodovsky
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA; School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| |
Collapse
|
3
|
Lang BF, Beck N, Prince S, Sarrasin M, Rioux P, Burger G. Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1222186. [PMID: 37469769 PMCID: PMC10352661 DOI: 10.3389/fpls.2023.1222186] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/15/2023] [Indexed: 07/21/2023]
Abstract
Compared to nuclear genomes, mitochondrial genomes (mitogenomes) are small and usually code for only a few dozen genes. Still, identifying genes and their structure can be challenging and time-consuming. Even automated tools for mitochondrial genome annotation often require manual analysis and curation by skilled experts. The most difficult steps are (i) the structural modelling of intron-containing genes; (ii) the identification and delineation of Group I and II introns; and (iii) the identification of moderately conserved, non-coding RNA (ncRNA) genes specifying 5S rRNAs, tmRNAs and RNase P RNAs. Additional challenges arise through genetic code evolution which can redefine the translational identity of both start and stop codons, thus obscuring protein-coding genes. Further, RNA editing can render gene identification difficult, if not impossible, without additional RNA sequence data. Current automated mito- and plastid-genome annotators are limited as they are typically tailored to specific eukaryotic groups. The MFannot annotator we developed is unique in its applicability to a broad taxonomic scope, its accuracy in gene model inference, and its capabilities in intron identification and classification. The pipeline leverages curated profile Hidden Markov Models (HMMs), covariance (CMs) and ERPIN models to better capture evolutionarily conserved signatures in the primary sequence (HMMs and CMs) as well as secondary structure (CMs and ERPIN). Here we formally describe MFannot, which has been available as a web-accessible service (https://megasun.bch.umontreal.ca/apps/mfannot/) to the research community for nearly 16 years. Further, we report its performance on particularly intron-rich mitogenomes and describe ongoing and future developments.
Collapse
|
4
|
Shulgina Y, Eddy SR. Codetta: predicting the genetic code from nucleotide sequence. Bioinformatics 2023; 39:6895099. [PMID: 36511586 PMCID: PMC9825746 DOI: 10.1093/bioinformatics/btac802] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 11/10/2022] [Indexed: 12/15/2022] Open
Abstract
SUMMARY Codetta is a Python program for predicting the genetic code table of an organism from nucleotide sequences. Codetta can analyze an arbitrary nucleotide sequence and needs no sequence annotation or taxonomic placement. The most likely amino acid decoding for each of the 64 codons is inferred from alignments of profile hidden Markov models of conserved proteins to the input sequence. AVAILABILITY AND IMPLEMENTATION Codetta 2.0 is implemented as a Python 3 program for MacOS and Linux and is available from http://eddylab.org/software/codetta/codetta2.tar.gz and at http://github.com/kshulgina/codetta. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yekaterina Shulgina
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | | |
Collapse
|
5
|
Shulgina Y, Eddy SR. A computational screen for alternative genetic codes in over 250,000 genomes. eLife 2021; 10:71402. [PMID: 34751130 PMCID: PMC8629427 DOI: 10.7554/elife.71402] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 10/26/2021] [Indexed: 11/25/2022] Open
Abstract
The genetic code has been proposed to be a ‘frozen accident,’ but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force that likely helped drive these codons to low frequency and enable their reassignment. All life forms rely on a ‘code’ to translate their genetic information into proteins. This code relies on limited permutations of three nucleotides – the building blocks that form DNA and other types of genetic information. Each ‘triplet’ of nucleotides – or codon – encodes a specific amino acid, the basic component of proteins. Reading the sequence of codons in the right order will let the cell know which amino acid to assemble next on a growing protein. For instance, the codon CGG – formed of the nucleotides guanine (G) and cytosine (C) – codes for the amino acid arginine. From bacteria to humans, most life forms rely on the same genetic code. Yet certain organisms have evolved to use slightly different codes, where one or several codons have an altered meaning. To better understand how alternative genetic codes have evolved, Shulgina and Eddy set out to find more organisms featuring these altered codons, creating a new software called Codetta that can analyze the genome of a microorganism and predict the genetic code it uses. Codetta was then used to sift through the genetic information of 250,000 microorganisms. This was made possible by the sequencing, in recent years, of the genomes of hundreds of thousands of bacteria and other microorganisms – including many never studied before. These analyses revealed five groups of bacteria with alternative genetic codes, all of which had changes in the codons that code for arginine. Amongst these, four had genomes with a low proportion of guanine and cytosine nucleotides. This may have made some guanine and cytosine-rich arginine codons very rare in these organisms and, therefore, easier to be reassigned to encode another amino acid. The work by Shulgina and Eddy demonstrates that Codetta is a new, useful tool that scientists can use to understand how genetic codes evolve. In addition, it can also help to ensure the accuracy of widely used protein databases, which assume which genetic code organisms use to predict protein sequences from their genomes.
Collapse
Affiliation(s)
| | - Sean R Eddy
- Molecular & Cellular Biology, Harvard University, Cambridge, United States
| |
Collapse
|
6
|
Žihala D, Eliáš M. Evolution and Unprecedented Variants of the Mitochondrial Genetic Code in a Lineage of Green Algae. Genome Biol Evol 2020; 11:2992-3007. [PMID: 31617565 PMCID: PMC6821328 DOI: 10.1093/gbe/evz210] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/17/2019] [Indexed: 12/15/2022] Open
Abstract
Mitochondria of diverse eukaryotes have evolved various departures from the standard genetic code, but the breadth of possible modifications and their phylogenetic distribution are known only incompletely. Furthermore, it is possible that some codon reassignments in previously sequenced mitogenomes have been missed, resulting in inaccurate protein sequences in databases. Here we show, considering the distribution of codons at conserved amino acid positions in mitogenome-encoded proteins, that mitochondria of the green algal order Sphaeropleales exhibit a diversity of codon reassignments, including previously missed ones and some that are unprecedented in any translation system examined so far, necessitating redefinition of existing translation tables and creating at least seven new ones. We resolve a previous controversy concerning the meaning the UAG codon in Hydrodictyaceae, which beyond any doubt encodes alanine. We further demonstrate that AGG, sometimes together with AGA, encodes alanine instead of arginine in diverse sphaeroplealeans. Further newly detected changes include Arg-to-Met reassignment of the AGG codon and Arg-to-Leu reassignment of the CGG codon in particular species. Analysis of tRNAs specified by sphaeroplealean mitogenomes provides direct support for and molecular underpinning of the proposed reassignments. Furthermore, we point to unique mutations in the mitochondrial release factor mtRF1a that correlate with changes in the use of termination codons in Sphaeropleales, including the two independent stop-to-sense UAG reassignments, the reintroduction of UGA in some Scenedesmaceae, and the sense-to-stop reassignment of UCA widespread in the group. Codon disappearance seems to be the main drive of the dynamic evolution of the mitochondrial genetic code in Sphaeropleales.
Collapse
Affiliation(s)
- David Žihala
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Czech Republic.,Institute of Environmental Technologies, Faculty of Science, University of Ostrava, Czech Republic
| | - Marek Eliáš
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Czech Republic.,Institute of Environmental Technologies, Faculty of Science, University of Ostrava, Czech Republic
| |
Collapse
|
7
|
McGrath C. Highlight: Recracking the Genetic Code. Genome Biol Evol 2019; 11:2990-2991. [PMID: 31665771 PMCID: PMC6821307 DOI: 10.1093/gbe/evz211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2019] [Indexed: 11/13/2022] Open
|
8
|
Noutahi E, Calderon V, Blanchette M, El-Mabrouk N, Lang BF. Rapid Genetic Code Evolution in Green Algal Mitochondrial Genomes. Mol Biol Evol 2019; 36:766-783. [PMID: 30698742 PMCID: PMC6551751 DOI: 10.1093/molbev/msz016] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Genetic code deviations involving stop codons have been previously reported in mitochondrial genomes of several green plants (Viridiplantae), most notably chlorophyte algae (Chlorophyta). However, as changes in codon recognition from one amino acid to another are more difficult to infer, such changes might have gone unnoticed in particular lineages with high evolutionary rates that are otherwise prone to codon reassignments. To gain further insight into the evolution of the mitochondrial genetic code in green plants, we have conducted an in-depth study across mtDNAs from 51 green plants (32 chlorophytes and 19 streptophytes). Besides confirming known stop-to-sense reassignments, our study documents the first cases of sense-to-sense codon reassignments in Chlorophyta mtDNAs. In several Sphaeropleales, we report the decoding of AGG codons (normally arginine) as alanine, by tRNA(CCU) of various origins that carry the recognition signature for alanine tRNA synthetase. In Chromochloris, we identify tRNA variants decoding AGG as methionine and the synonymous codon CGG as leucine. Finally, we find strong evidence supporting the decoding of AUA codons (normally isoleucine) as methionine in Pycnococcus. Our results rely on a recently developed conceptual framework (CoreTracker) that predicts codon reassignments based on the disparity between DNA sequence (codons) and the derived protein sequence. These predictions are then validated by an evaluation of tRNA phylogeny, to identify the evolution of new tRNAs via gene duplication and loss, and structural modifications that lead to the assignment of new tRNA identities and a change in the genetic code.
Collapse
Affiliation(s)
- Emmanuel Noutahi
- Département d'Informatique et de Recherche opérationnelle (DIRO), Université de Montréal, CP 6128 succursale Centre-Ville, Montreal, QC, Canada
| | - Virginie Calderon
- Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, McConnell Engineering Bldg., Montréal, QC H3A 0E9, Canada
- McGill Centre for Bioinformatics, McGill University, Montréal, QC, Canada
| | - Nadia El-Mabrouk
- Département d'Informatique et de Recherche opérationnelle (DIRO), Université de Montréal, CP 6128 succursale Centre-Ville, Montreal, QC, Canada
| | - Bernd Franz Lang
- Département de Biochimie, Centre Robert Cedergren, Université de Montréal, CP 6128 succursale Centre-Ville, Montreal, QC, Canada
| |
Collapse
|
9
|
He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y. PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics 2018; 19:306. [PMID: 30157750 PMCID: PMC6114832 DOI: 10.1186/s12859-018-2321-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 08/21/2018] [Indexed: 01/28/2023] Open
Abstract
Background Pseudouridylation is the most prevalent type of posttranscriptional modification in various stable RNAs of all organisms, which significantly affects many cellular processes that are regulated by RNA. Thus, accurate identification of pseudouridine (Ψ) sites in RNA will be of great benefit for understanding these cellular processes. Due to the low efficiency and high cost of current available experimental methods, it is highly desirable to develop computational methods for accurately and efficiently detecting Ψ sites in RNA sequences. However, the predictive accuracy of existing computational methods is not satisfactory and still needs improvement. Results In this study, we developed a new model, PseUI, for Ψ sites identification in three species, which are H. sapiens, S. cerevisiae, and M. musculus. Firstly, five different kinds of features including nucleotide composition (NC), dinucleotide composition (DC), pseudo dinucleotide composition (pseDNC), position-specific nucleotide propensity (PSNP), and position-specific dinucleotide propensity (PSDP) were generated based on RNA segments. Then, a sequential forward feature selection strategy was used to gain an effective feature subset with a compact representation but discriminative prediction power. Based on the selected feature subsets, we built our model by using a support vector machine (SVM). Finally, the generalization of our model was validated by both the jackknife test and independent validation tests on the benchmark datasets. The experimental results showed that our model is more accurate and stable than the previously published models. We have also provided a user-friendly web server for our model at http://zhulab.ahu.edu.cn/PseUI, and a brief instruction for the web server is provided in this paper. By using this instruction, the academic users can conveniently get their desired results without complicated calculations. Conclusion In this study, we proposed a new predictor, PseUI, to detect Ψ sites in RNA sequences. It is shown that our model outperformed the existing state-of-art models. It is expected that our model, PseUI, will become a useful tool for accurate identification of RNA Ψ sites. Electronic supplementary material The online version of this article (10.1186/s12859-018-2321-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jingjing He
- School of Life Sciences, Anhui University, Hefei, 230601, Anhui, China
| | - Ting Fang
- School of Life Sciences, Anhui University, Hefei, 230601, Anhui, China
| | - Zizheng Zhang
- School of Life Sciences, Anhui University, Hefei, 230601, Anhui, China
| | - Bei Huang
- School of Life Sciences, Anhui University, Hefei, 230601, Anhui, China
| | - Xiaolei Zhu
- School of Life Sciences, Anhui University, Hefei, 230601, Anhui, China.
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
10
|
Alignment-based and alignment-free methods converge with experimental data on amino acids coded by stop codons at split between nuclear and mitochondrial genetic codes. Biosystems 2018; 167:33-46. [DOI: 10.1016/j.biosystems.2018.03.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/18/2018] [Accepted: 03/19/2018] [Indexed: 12/11/2022]
|