1
|
Baena-Angulo C, Platero AI, Couso JP. Cis to trans: small ORF functions emerging through evolution. Trends Genet 2025; 41:119-131. [PMID: 39603921 DOI: 10.1016/j.tig.2024.10.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/14/2024] [Accepted: 10/28/2024] [Indexed: 11/29/2024]
Abstract
Hundreds of thousands of small open reading frames (smORFs) of less than 100 codons exist in every genome, especially in long noncoding RNAs (lncRNAs) and in the 5' leaders of mRNAs. smORFs are often discarded as nonfunctional, but ribosomal profiling (RiboSeq) reveals that thousands are translated, while characterised smORF functions have risen from anecdotal to identifiable trends: smORFs can either have a cis-noncoding regulatory function (involving low translation of nonfunctional peptides) or full coding function mediated by robustly translated peptides, often having cellular and physiological roles as membrane-associated regulators of canonical proteins. The evolutionary context reveals that many smORFs represent new genes emerging de novo from noncoding sequences. We suggest a mechanism for this process, where cis-noncoding smORF functions provide niches for the subsequent evolution of full peptide functions.
Collapse
Affiliation(s)
- Casimiro Baena-Angulo
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera Km1, Sevilla 41013, Spain
| | - Ana Isabel Platero
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera Km1, Sevilla 41013, Spain
| | - Juan Pablo Couso
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera Km1, Sevilla 41013, Spain.
| |
Collapse
|
2
|
Zhao H, Qin L, Deng X, Wang Z, Jiang R, Reitz SR, Wu S, He Z. Nucleotide and dinucleotide preference of segmented viruses are shaped more by segment: In case study of tomato spotted wilt virus. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2024; 122:105608. [PMID: 38796047 DOI: 10.1016/j.meegid.2024.105608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/16/2024] [Accepted: 05/21/2024] [Indexed: 05/28/2024]
Abstract
Several studies have showed that the nucleotide and dinucleotide composition of viruses possibly follows their host species or protein coding region. Nevertheless, the influence of viral segment on viral nucleotide and dinucleotide composition is still unknown. Here, we explored through tomato spotted wilt virus (TSWV), a segmented virus that seriously threatens the production of tomatoes all over the world. Through nucleotide composition analysis, we found the same over-representation of A across all viral segments at the first and second codon position, but it exhibited distinct in segments at the third codon position. Interestingly, the protein coding regions which encoded by the same or different segments exhibit obvious distinct nucleotide preference. Then, we found that the dinucleotides UpG and CpU were overrepresented and the dinucleotides UpA, CpG and GpU were underrepresented, not only in the complete genomic sequences, but also in different segments, protein coding regions and host species. Notably, 100% of the data investigated here were predicted to the correct viral segment and protein coding region, despite the fact that only 67% of the data analyzed here were predicted to the correct viral host species. In conclusion, in case study of TSWV, nucleotide composition and dinucleotide preference of segment viruses are more strongly dependent on segment and protein coding region than on host species. This research provides a novel perspective on the molecular evolutionary mechanisms of TSWV and provides reference for future research on genetic diversity of segmented viruses.
Collapse
Affiliation(s)
- Haiting Zhao
- College of Plant Protection, Yangzhou University, Yangzhou 225009, China
| | - Lang Qin
- College of Plant Protection, Yangzhou University, Yangzhou 225009, China
| | - Xiaolong Deng
- College of Plant Protection, Yangzhou University, Yangzhou 225009, China
| | - Zhilei Wang
- College of Plant Protection, Yangzhou University, Yangzhou 225009, China
| | - Runzhou Jiang
- College of Plant Protection, Yangzhou University, Yangzhou 225009, China
| | - Stuart R Reitz
- Malheur Experiment Station, Oregon State University, Ontario, OR, USA
| | - Shengyong Wu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China.
| | - Zhen He
- College of Plant Protection, Yangzhou University, Yangzhou 225009, China; Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou 225009, China.
| |
Collapse
|
3
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
4
|
Ke Z, Zhou K, Hou M, Luo H, Li Z, Pan X, Zhou J, Jing T, Ye H. Characterization of the Complete Mitochondrial Genome of the Elongate Loach and Its Phylogenetic Implications in Cobitidae. Animals (Basel) 2023; 13:3841. [PMID: 38136877 PMCID: PMC10740543 DOI: 10.3390/ani13243841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/01/2023] [Accepted: 12/02/2023] [Indexed: 12/24/2023] Open
Abstract
The elongate loach is an endemic fish in China. Previous studies have provided some insights into the mitochondrial genome composition and the phylogenetic relationships of the elongate loach inferred using protein-coding genes (PCGs), yet detailed information about it remains limited. Therefore, in this study we sequenced the complete mitochondrial genome of the elongate loach and analyzed its structural characteristics. The PCGs and mitochondrial genome were used for selective stress analysis and genomic comparative analysis. The complete mitochondrial genome of the elongate loach, together with those of 35 Cyprinidae species, was used to infer the phylogenetic relationships of the Cobitidae family through maximum likelihood (ML) reconstruction. The results showed that the genome sequence has a full length of 16,591 bp, which includes 13 PCGs, 22 transfer RNA genes (tRNA), 2 ribosomal RNA genes (rRNA), and 2 non-coding regions (CR D-loop and light chain sub-chain replication origin OL). Overall, the elongate loach shared the same gene arrangement and composition of the mitochondrial genes with other teleost fishes. The Ka/Ks ratios of all mitochondrial PCGs were less than 1, indicating that all of the PCGs were evolving under purifying selection. Genome comparison analyses showed a significant sequence homology of species of Leptobotia. A significant identity between L. elongata and the other five Leptobotia species was observed in the visualization result, except for L. mantschurica, which lacked the tRNA-Arg gene and had a shorter tRNA-Asp gene. The phylogenetic tree revealed that the Cobitidae species examined here can be grouped into two clades, with the elongate loach forming a sister relationship with L. microphthalma. This study could provide additional inferences for a better understanding of the phylogenetic relationships among Cobitidae species.
Collapse
Affiliation(s)
- Zhenlin Ke
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, College of Fisheries, Southwest University, Chongqing 402460, China; (Z.K.); (M.H.); (H.L.); (T.J.)
- Key Laboratory of Aquatic Science of Chongqing, Chongqing 400175, China
| | - Kangqi Zhou
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fishery Sciences, Nanning 530021, China; (K.Z.); (Z.L.); (X.P.)
| | - Mengdan Hou
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, College of Fisheries, Southwest University, Chongqing 402460, China; (Z.K.); (M.H.); (H.L.); (T.J.)
- Key Laboratory of Aquatic Science of Chongqing, Chongqing 400175, China
| | - Hui Luo
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, College of Fisheries, Southwest University, Chongqing 402460, China; (Z.K.); (M.H.); (H.L.); (T.J.)
- Key Laboratory of Aquatic Science of Chongqing, Chongqing 400175, China
| | - Zhe Li
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fishery Sciences, Nanning 530021, China; (K.Z.); (Z.L.); (X.P.)
| | - Xianhui Pan
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fishery Sciences, Nanning 530021, China; (K.Z.); (Z.L.); (X.P.)
| | - Jian Zhou
- Fisheries Institute, Sichuan Academy of Agricultural Sciences, Chengdu 611731, China
| | - Tingsen Jing
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, College of Fisheries, Southwest University, Chongqing 402460, China; (Z.K.); (M.H.); (H.L.); (T.J.)
- Key Laboratory of Aquatic Science of Chongqing, Chongqing 400175, China
| | - Hua Ye
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, College of Fisheries, Southwest University, Chongqing 402460, China; (Z.K.); (M.H.); (H.L.); (T.J.)
- Key Laboratory of Aquatic Science of Chongqing, Chongqing 400175, China
| |
Collapse
|
5
|
Lamolle G, Iriarte A, Simón D, Musto H. Amino acid usage and protein expression levels in the flatworm Schistosoma mansoni. Mol Biochem Parasitol 2023; 255:111581. [PMID: 37478919 DOI: 10.1016/j.molbiopara.2023.111581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/10/2023] [Accepted: 07/17/2023] [Indexed: 07/23/2023]
Abstract
Schistosoma mansoni is a parasitic flatworm that causes a human disease called schistosomiasis, or bilharzia. At the genomic level, S. mansoni is AT-rich, but has some compositional heterogeneity. Indeed, some regions of its genome are GC-rich, mainly in the regions located near the extreme ends of the chromosomes. Recently, we showed that, despite the strong bias towards A/T ending codons, highly expressed genes tend to use GC-rich codons. Here, we address the following question: are highly expressed sequences biased in their amino acid frequencies? Our analyses show that these sequences in S. mansoni, as in species ranging from bacteria to human, are strongly biased in nucleotide composition. Highly expressed genes tend to use GC-rich codons (in the first and second codon positions), which code the energetically cheapest amino acids. Therefore, we conclude that amino acid usage, at least in highly expressed genes, is strongly shaped by natural selection to avoid energetically expensive residues. Whether this is an adaptation to the parasitic way of life of S. mansoni, is unclear since the same pattern occurs in free-living species.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay
| | - Andrés Iriarte
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay; Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Avenida A. Navarro 3051, 11600 Montevideo, Uruguay
| | - Diego Simón
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay; Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Universidad de la República, Mataojo 2055, 11400 Montevideo, Uruguay; Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Mataojo 2020, 11400 Montevideo, Uruguay
| | - Héctor Musto
- Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Iguá 4225, 11400 Montevideo, Uruguay.
| |
Collapse
|
6
|
Lamolle G, Simón D, Iriarte A, Musto H. Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically.
Collapse
Affiliation(s)
- Guillermo Lamolle
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
| | - Diego Simón
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Andrés Iriarte
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de La República, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de La República, Montevideo, Uruguay.
| |
Collapse
|
7
|
Characterization of mitochondrial genome of Indian Ocean blue-spotted maskray, Neotrygon indica and its phylogenetic relationship within Dasyatidae Family. Int J Biol Macromol 2022; 223:458-467. [PMID: 36347369 DOI: 10.1016/j.ijbiomac.2022.10.277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 10/11/2022] [Accepted: 10/28/2022] [Indexed: 11/06/2022]
Abstract
The present study characterized complete mitochondrial genome of Blue-spotted maskray, Neotrygon indica and studied the evolutionary relationship of the species within the Dasyatidae family. The total length of the mitogenome was 17,974 bp including 37 genes and a non-coding control region. The average frequency of nucleotides in protein-coding genes was A: 29.1 %, T: 30.2 %, G: 13.0 % and C: 27.7 % with AT content of 59.3 %. The values of AT and GC skewness were -0.018 and -0.338, respectively. Comparative analyses showed a large number of average synonymous substitutions per synonymous site (Ks) in gene NADH4 (5.07) followed by NADH5 (4.72). High values of average number of non-synonymous substitutions per non-synonymous site (Ka) were observed in genes ATPase8 (0.54) and NADH2 (0.44). Genes NADH4L and NADH2 showed high interspecific genetic distance values of 0.224 ± 0.001 and 0.213 ± 0.002, respectively. Heat map analysis showed variation in codon usage among different species of the Dasyatidae family. The phylogenetic tree showed a sister relationship between the Dasyatinae and the Neotrygoninae subfamilies. Neotrygon indica formed as a sister species to the clade consisting of N. varidens and N. orientalis. Based on the present results, Neotrygon indica could have diverged from the common ancestor of the two latter in the Plio-Pleistocene. The present study showed distinct characteristics of N. indica from its congeners through comparative mitogenomics.
Collapse
|
8
|
Musto H. In Memoriam of Giorgio Bernardi and Noboru Sueoka: A Personal View. J Mol Evol 2022; 90:325-327. [PMID: 35838772 DOI: 10.1007/s00239-022-10066-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 07/08/2022] [Indexed: 11/24/2022]
Affiliation(s)
- Héctor Musto
- Laboratorio de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
9
|
Wang X, Li LL, Xiao Y, Chen XY, Chen JH, Hu XS. A complete sequence of mitochondrial genome of Neolamarckia cadamba and its use for systematic analysis. Sci Rep 2021; 11:21452. [PMID: 34728739 PMCID: PMC8564537 DOI: 10.1038/s41598-021-01040-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 10/22/2021] [Indexed: 11/09/2022] Open
Abstract
Neolamarckia cadamba is an important tropical and subtropical tree for timber industry in southern China and is also a medicinal plant because of the secondary product cadambine. N. cadamba belongs to Rubiaceae family and its taxonomic relationships with other species are not fully evaluated based on genome sequences. Here, we report the complete sequences of mitochondrial genome of N. cadamba, which is 414,980 bp in length and successfully assembled in two genome circles (109,836 bp and 305,144 bp). The mtDNA harbors 83 genes in total, including 40 protein-coding genes (PCGs), 31 transfer RNA genes, 6 ribosomal RNA genes, and 6 other genes. The base composition of the whole genome is estimated as 27.26% for base A, 22.63% for C, 22.53% for G, and 27.56% for T, with the A + T content of 54.82% (54.45% in the small circle and 54.79% in the large circle). Repetitive sequences account for ~ 0.14% of the whole genome. A maximum likelihood (ML) tree based on DNA sequences of 24 PCGs supports that N. cadamba belongs to order Gentianales. A ML tree based on rps3 gene of 60 species in family Rubiaceae shows that N. cadamba is more related to Cephalanthus accidentalis and Hymenodictyon parvifolium and belongs to the Cinchonoideae subfamily. The result indicates that N. cadamba is genetically distant from the species and genera of Rubiaceae in systematic position. As the first sequence of mitochondrial genome of N. cadamba, it will provide a useful resource to investigate genetic variation and develop molecular markers for genetic breeding in the future.
Collapse
Affiliation(s)
- Xi Wang
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong, 510642, China.,Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong, 510642, China
| | - Ling-Ling Li
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong, 510642, China.,Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong, 510642, China
| | - Yu Xiao
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong, 510642, China.,Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong, 510642, China
| | - Xiao-Yang Chen
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong, 510642, China.,Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong, 510642, China
| | - Jie-Hu Chen
- Science Corporation of Gene (SCGene), Guangzhou, 510000, China
| | - Xin-Sheng Hu
- College of Forestry and Landscape Architecture, South China Agricultural University, Guangdong, 510642, China. .,Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, South China Agricultural University, Guangdong, 510642, China.
| |
Collapse
|
10
|
Mazumder TH, Alqahtani AM, Alqahtani T, Emran TB, A. Aldahish A, Uddin A. Analysis of Codon Usage of Speech Gene FoxP2 among Animals. BIOLOGY 2021; 10:1078. [PMID: 34827071 PMCID: PMC8614651 DOI: 10.3390/biology10111078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/12/2021] [Accepted: 10/16/2021] [Indexed: 12/03/2022]
Abstract
The protein-coding gene FoxP2 (fork head box protein P2) plays a major role in communication and evolutionary changes. The present study carried out a comprehensive codon usage bias analysis in the FoxP2 gene among a diverse group of animals including fishes, birds, reptiles, and mammals. We observed that in the genome of fishes for the FoxP2 gene, codons ending with C or G were most frequently used, while in birds, reptiles, and mammals, codons ending with T or A were most frequently used. A higher ENC value was observed for the FoxP2 gene indicating a lower CUB. Parity role two-bias plots suggested that apart from mutation pressure, other factors such as natural selection might have influenced the CUB. The frequency distribution of the ENC observed and ENC expected ratio revealed that mutation pressure plays a key role in the patterns of codon usage of FoxP2. Besides, correspondence analysis exposed the composition of the nucleobase under mutation bias affects the codon usage of the FoxP2 gene. However, neutrality plots revealed the major role of natural selection over mutation pressure in the CUB of FoxP2. In addition, the codon usage patterns for FoxP2 among the selected genomes suggested that nature has favored nearly all the synonymous codons for encoding the corresponding amino acid. The uniform usage of 12 synonymous codons for FoxP2 was observed among the species of birds. The amino acid usage frequency for FoxP2 revealed that the amino acids Leucine, Glutamine, and Serine were predominant over other amino acids among all the species of fishes, birds, reptiles, and mammals.
Collapse
Affiliation(s)
| | - Ali M. Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (A.M.A.); (T.A.); (A.A.A.)
| | - Taha Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (A.M.A.); (T.A.); (A.A.A.)
| | - Talha Bin Emran
- Department of Pharmacy, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh;
| | - Afaf A. Aldahish
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (A.M.A.); (T.A.); (A.A.A.)
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial College, Hailakandi 788150, Assam, India
| |
Collapse
|
11
|
Simón D, Cristina J, Musto H. Nucleotide Composition and Codon Usage Across Viruses and Their Respective Hosts. Front Microbiol 2021; 12:646300. [PMID: 34262534 PMCID: PMC8274242 DOI: 10.3389/fmicb.2021.646300] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 06/04/2021] [Indexed: 11/13/2022] Open
Abstract
The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of ∼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist.
Collapse
Affiliation(s)
- Diego Simón
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.,Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la Republica, Montevideo, Uruguay.,Laboratorio de Evolución Experimental de Virus, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Juan Cristina
- Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la Republica, Montevideo, Uruguay
| | - Héctor Musto
- Laboratorio de Genómica Evolutiva, Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
12
|
de Oliveira JL, Morales AC, Hurst LD, Urrutia AO, Thompson CRL, Wolf JB. Inferring Adaptive Codon Preference to Understand Sources of Selection Shaping Codon Usage Bias. Mol Biol Evol 2021; 38:3247-3266. [PMID: 33871580 PMCID: PMC8321536 DOI: 10.1093/molbev/msab099] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Alternative synonymous codons are often used at unequal frequencies. Classically, studies of such codon usage bias (CUB) attempted to separate the impact of neutral from selective forces by assuming that deviations from a predicted neutral equilibrium capture selection. However, GC-biased gene conversion (gBGC) can also cause deviation from a neutral null. Alternatively, selection has been inferred from CUB in highly expressed genes, but the accuracy of this approach has not been extensively tested, and gBGC can interfere with such extrapolations (e.g., if expression and gene conversion rates covary). It is therefore critical to examine deviations from a mutational null in a species with no gBGC. To achieve this goal, we implement such an analysis in the highly AT rich genome of Dictyostelium discoideum, where we find no evidence of gBGC. We infer neutral CUB under mutational equilibrium to quantify "adaptive codon preference," a nontautologous genome wide quantitative measure of the relative selection strength driving CUB. We observe signatures of purifying selection consistent with selection favoring adaptive codon preference. Preferred codons are not GC rich, underscoring the independence from gBGC. Expression-associated "preference" largely matches adaptive codon preference but does not wholly capture the influence of selection shaping patterns across all genes, suggesting selective constraints associated specifically with high expression. We observe patterns consistent with effects on mRNA translation and stability shaping adaptive codon preference. Thus, our approach to quantifying adaptive codon preference provides a framework for inferring the sources of selection that shape CUB across different contexts within the genome.
Collapse
Affiliation(s)
- Janaina Lima de Oliveira
- Instituto de Biologia, Universidade Federal da Bahia, Salvador, Bahia, 40170-115, Brazil.,Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Atahualpa Castillo Morales
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Laurence D Hurst
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Araxi O Urrutia
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK.,Instituto de Ecologia, UNAM, Ciudad de Mexico 04510, Mexico
| | - Christopher R L Thompson
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Jason B Wolf
- Milner Centre for Evolution and Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| |
Collapse
|
13
|
Maldonado LL, Bertelli AM, Kamenetzky L. Molecular features similarities between SARS-CoV-2, SARS, MERS and key human genes could favour the viral infections and trigger collateral effects. Sci Rep 2021; 11:4108. [PMID: 33602998 PMCID: PMC7893037 DOI: 10.1038/s41598-021-83595-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 01/26/2021] [Indexed: 01/31/2023] Open
Abstract
In December 2019, rising pneumonia cases caused by a novel β-coronavirus (SARS-CoV-2) occurred in Wuhan, China, which has rapidly spread worldwide, causing thousands of deaths. The WHO declared the SARS-CoV-2 outbreak as a public health emergency of international concern, since then several scientists are dedicated to its study. It has been observed that many human viruses have codon usage biases that match highly expressed proteins in the tissues they infect and depend on the host cell machinery for the replication and co-evolution. In this work, we analysed 91 molecular features and codon usage patterns for 339 viral genes and 463 human genes that consisted of 677,873 codon positions. Hereby, we selected the highly expressed genes from human lung tissue to perform computational studies that permit to compare their molecular features with those of SARS, SARS-CoV-2 and MERS genes. The integrated analysis of all the features revealed that certain viral genes and overexpressed human genes have similar codon usage patterns. The main pattern was the A/T bias that together with other features could propitiate the viral infection, enhanced by a host dependant specialization of the translation machinery of only some of the overexpressed genes. The envelope protein E, the membrane glycoprotein M and ORF7 could be further benefited. This could be the key for a facilitated translation and viral replication conducting to different comorbidities depending on the genetic variability of population due to the host translation machinery. This is the first codon usage approach that reveals which human genes could be potentially deregulated due to the codon usage similarities between the host and the viral genes when the virus is already inside the human cells of the lung tissues. Our work leaded to the identification of additional highly expressed human genes which are not the usual suspects but might play a role in the viral infection and settle the basis for further research in the field of human genetics associated with new viral infections. To identify the genes that could be deregulated under a viral infection is important to predict the collateral effects and determine which individuals would be more susceptible based on their genetic features and comorbidities associated.
Collapse
Affiliation(s)
- Lucas L Maldonado
- IMPaM, CONICET, Facultad de Medicina, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
| | | | - Laura Kamenetzky
- IMPaM, CONICET, Facultad de Medicina, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
- iB3 | Instituto de Biociencias, Biotecnología y Biología traslacional, Departamento de Fisiologia y Biologia Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
14
|
Schwersensky M, Rooman M, Pucci F. Large-scale in silico mutagenesis experiments reveal optimization of genetic code and codon usage for protein mutational robustness. BMC Biol 2020; 18:146. [PMID: 33081759 PMCID: PMC7576759 DOI: 10.1186/s12915-020-00870-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/16/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Collapse
Affiliation(s)
- Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, Brussels, 1050, Belgium.
- Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, Brussels, 1050, Belgium.
| |
Collapse
|
15
|
Barbhuiya PA, Uddin A, Chakraborty S. Codon usage pattern and evolutionary forces of mitochondrial ND genes among orders of class Amphibia. J Cell Physiol 2020; 236:2850-2868. [PMID: 32960450 DOI: 10.1002/jcp.30050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 08/07/2020] [Accepted: 08/31/2020] [Indexed: 12/18/2022]
Abstract
In this study, we used a bioinformatics approach to analyze the nucleotide composition and pattern of synonymous codon usage in mitochondrial ND genes in three amphibian groups, that is, orders Anura, Caudata, and Gymnophiona to identify the commonality and the differences of codon usage as no research work was reported yet. The high value of the effective number of codons revealed that the codon usage bias (CUB) was low in mitochondrial ND genes among the orders. Nucleotide composition analysis suggested that for each gene, the compositional features differed among Anura, Caudata, and Gymnophiona and the GC content was lower than AT content. Furthermore, a highly significant difference (p < .05) for GC content was found in each gene among the orders. The heat map showed contrasting patterns of codon usage among different ND genes. The regression of GC12 on GC3 suggested a narrow range of GC3 distribution and some points were located in the diagonal, indicating both mutation pressure and natural selection might influence the CUB. Moreover, the slope of the regression line was less than 0.5 in all ND genes among orders, indicating natural selection might have played the dominant role whereas mutation pressure had played a minor role in shaping CUB of ND genes across orders.
Collapse
Affiliation(s)
| | - Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Hailakandi, Assam, India
| | | |
Collapse
|
16
|
Closing Human Reference Genome Gaps: Identifying and Characterizing Gap-Closing Sequences. G3-GENES GENOMES GENETICS 2020; 10:2801-2809. [PMID: 32532800 PMCID: PMC7407462 DOI: 10.1534/g3.120.401280] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.
Collapse
|
17
|
Maldonado LL, Stegmayer G, Milone DH, Oliveira G, Rosenzvit M, Kamenetzky L. Whole genome analysis of codon usage in Echinococcus. Mol Biochem Parasitol 2018; 225:54-66. [DOI: 10.1016/j.molbiopara.2018.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Revised: 07/20/2018] [Accepted: 08/01/2018] [Indexed: 01/15/2023]
|
18
|
Codon usage and amino acid usage influence genes expression level. Genetica 2017; 146:53-63. [DOI: 10.1007/s10709-017-9996-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2017] [Accepted: 10/09/2017] [Indexed: 11/30/2022]
|
19
|
|
20
|
Pathak J, Kannaujiya VK, Singh SP, Sinha RP. Codon usage analysis of photolyase encoding genes of cyanobacteria inhabiting diverse habitats. 3 Biotech 2017; 7:192. [PMID: 28664377 DOI: 10.1007/s13205-017-0826-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 05/31/2017] [Indexed: 12/17/2022] Open
Abstract
Nucleotide and amino acid compositions were studied to determine the genomic and structural relationship of photolyase gene in freshwater, marine and hot spring cyanobacteria. Among three habitats, photolyase encoding genes from hot spring cyanobacteria were found to have highest GC content. The genomic GC content was found to influence the codon usage and amino acid variability in photolyases. The third position of codon was found to have more effect on amino acid variability in photolyases than the first and second positions of codon. The variation of amino acids Ala, Asp, Glu, Gly, His, Leu, Pro, Gln, Arg and Val in photolyases of three different habitats was found to be controlled by first position of codon (G1C1). However, second position (G2C2) of codon regulates variation of Ala, Cys, Gly, Pro, Arg, Ser, Thr and Tyr contents in photolyases. Third position (G3C3) of codon controls incorporation of amino acids such as Ala, Phe, Gly, Leu, Gln, Pro, Arg, Ser, Thr and Tyr in photolyases from three habitats. Photolyase encoding genes of hot spring cyanobacteria have 85% codons with G or C at third position, whereas marine and freshwater cyanobacteria showed 82 and 60% codons, respectively, with G or C at third position. Principal component analysis (PCA) showed that GC content has a profound effect in separating the genes along the first major axis according to their RSCU (relative synonymous codon usage) values, and neutrality analysis indicated that mutational pressure has resulted in codon bias in photolyase genes of cyanobacteria.
Collapse
Affiliation(s)
- Jainendra Pathak
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Vinod K Kannaujiya
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Shailendra P Singh
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Rajeshwar P Sinha
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India.
| |
Collapse
|
21
|
Szitenberg A, Cha S, Opperman CH, Bird DM, Blaxter ML, Lunt DH. Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements. Genome Biol Evol 2016; 8:2964-2978. [PMID: 27566762 PMCID: PMC5635653 DOI: 10.1093/gbe/evw208] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2016] [Indexed: 12/11/2022] Open
Abstract
Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host's genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes.
Collapse
Affiliation(s)
- Amir Szitenberg
- Evolutionary Biology Group, School of Environmental Sciences, University of Hull, England, United Kingdom The Dead Sea and Arava Science Center, Israel
| | - Soyeon Cha
- Department of Plant Pathology, North Carolina State University
| | | | - David M Bird
- Department of Plant Pathology, North Carolina State University
| | - Mark L Blaxter
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Scotland
| | - David H Lunt
- Evolutionary Biology Group, School of Environmental Sciences, University of Hull, England, United Kingdom
| |
Collapse
|
22
|
What We Know and What We Should Know About Codon Usage. J Mol Evol 2016; 82:245-6. [PMID: 27154234 DOI: 10.1007/s00239-016-9742-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 04/27/2016] [Indexed: 10/21/2022]
|
23
|
Gerdol M, De Moro G, Venier P, Pallavicini A. Analysis of synonymous codon usage patterns in sixty-four different bivalve species. PeerJ 2015; 3:e1520. [PMID: 26713259 PMCID: PMC4690358 DOI: 10.7717/peerj.1520] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 11/28/2015] [Indexed: 12/21/2022] Open
Abstract
Synonymous codon usage bias (CUB) is a defined as the non-random usage of codons encoding the same amino acid across different genomes. This phenomenon is common to all organisms and the real weight of the many factors involved in its shaping still remains to be fully determined. So far, relatively little attention has been put in the analysis of CUB in bivalve mollusks due to the limited genomic data available. Taking advantage of the massive sequence data generated from next generation sequencing projects, we explored codon preferences in 64 different species pertaining to the six major evolutionary lineages in Bivalvia. We detected remarkable differences across species, which are only partially dependent on phylogeny. While the intensity of CUB is mild in most organisms, a heterogeneous group of species (including Arcida and Mytilida, among the others) display higher bias and a strong preference for AT-ending codons. We show that the relative strength and direction of mutational bias, selection for translational efficiency and for translational accuracy contribute to the establishment of synonymous codon usage in bivalves. Although many aspects underlying bivalve CUB still remain obscure, we provide for the first time an overview of this phenomenon in this large, commercially and environmentally important, class of marine invertebrates.
Collapse
Affiliation(s)
- Marco Gerdol
- Department of Life Sciences, University of Trieste , Trieste , Italy
| | - Gianluca De Moro
- Department of Life Sciences, University of Trieste , Trieste , Italy
| | - Paola Venier
- Department of Biology, University of Padova , Padova , Italy
| | | |
Collapse
|
24
|
Flickinger R. AT-rich repetitive DNA sequences, transcription frequency and germ layer determination. Mech Dev 2015; 138 Pt 3:227-32. [PMID: 26506258 DOI: 10.1016/j.mod.2015.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 10/19/2015] [Accepted: 10/21/2015] [Indexed: 01/30/2023]
Abstract
Non-coding sequences of frog embryo endoderm poly (A+) nuclear RNA are AU-enriched, as compared to those of ectoderm and mesoderm. Endoderm blastomeres contain much less H1 histone than is present in ectoderm and mesoderm. H1 histone preferentially binds AT-rich DNA sequences to repress their transcription. The AT-enrichment of non-coding DNA sequences transcribed into poly (A+) nuclear RNA, as well as the low amount of H1 histone, may contribute to the higher transcription frequency of mRNA of endoderm, as compared to that of ectoderm and mesoderm. A greater accumulation of H1 histone in presumptive mesoderm and ectoderm may prevent transcription of endoderm specifying genes in mesoderm and ectoderm. Experimental upregulation of various transcription factors (TFs) can redirect germ layer fate. Most of these TFs bind AT-rich consensus sequences in DNA, suggesting that H1 histone and TFs active during germ layer determination are binding similar sequences.
Collapse
Affiliation(s)
- Reed Flickinger
- Emeritus Department, Biological Sciences State University of New York at Buffalo, Buffalo, N.Y. 14260, USA.
| |
Collapse
|
25
|
Abstract
Amino acids typically are encoded by multiple synonymous codons that are not used with the same frequency. Codon usage bias has drawn considerable attention, and several explanations have been offered, including variation in GC-content between species. Focusing on a simple parameter—combined GC proportion of all the synonymous codons for a particular amino acid, termed GCsyn—we try to deepen our understanding of the relationship between GC-content and amino acid/codon usage in more details. We analyzed 65 widely distributed representative species and found a close association between GCsyn, GC-content, and amino acids usage. The overall usages of the four amino acids with the greatest GCsyn and the five amino acids with the lowest GCsyn both vary with the regional GC-content, whereas the usage of the remaining 11 amino acids with intermediate GCsyn is less variable. More interesting, we discovered that codon usage frequencies are nearly constant in regions with similar GC-content. We further quantified the effects of regional GC-content variation (low to high) on amino acid usage and found that GC-content determines the usage variation of amino acids, especially those with extremely high GCsyn, which accounts for 76.7% of the changed GC-content for those regions. Our results suggest that GCsyn correlates with GC-content and has impact on codon/amino acid usage. These findings suggest a novel approach to understanding the role of codon and amino acid usage in shaping genomic architecture and evolutionary patterns of organisms.
Collapse
|
26
|
Rao Y, Wang Z, Chai X, Nie Q, Zhang X. Hydrophobicity and aromaticity are primary factors shaping variation in amino acid usage of chicken proteome. PLoS One 2014; 9:e110381. [PMID: 25329059 PMCID: PMC4199684 DOI: 10.1371/journal.pone.0110381] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 09/22/2014] [Indexed: 11/18/2022] Open
Abstract
Amino acids are utilized with different frequencies both among species and among genes within the same genome. Up to date, no study on the amino acid usage pattern of chicken has been performed. In the present study, we carried out a systematic examination of the amino acid usage in the chicken proteome. Our data indicated that the relative amino acid usage is positively correlated with the tRNA gene copy number. GC contents, including GC1, GC2, GC3, GC content of CDS and GC content of the introns, were correlated with the most of the amino acid usage, especially for GC rich and GC poor amino acids, however, multiple linear regression analyses indicated that only approximately 10–40% variation of amino acid usage can be explained by GC content for GC rich and GC poor amino acids. For other intermediate GC content amino acids, only approximately 10% variation can be explained. Correspondence analyses demonstrated that the main factors responsible for the variation of amino acid usage in chicken are hydrophobicity, aromaticity and genomic GC content. Gene expression level also influenced the amino acid usage significantly. We argued that the amino acid usage of chicken proteome likely reflects a balance or near balance between the action of selection, mutation, and genetic drift.
Collapse
Affiliation(s)
- Yousheng Rao
- Department of Biological Technology, Nanchang Normal University, Nanchang, Jiangxi, China
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, South China Agricultural University, Guangzhou, Guangdong, China
| | - Zhangfeng Wang
- Department of Biological Technology, Nanchang Normal University, Nanchang, Jiangxi, China
| | - Xuewen Chai
- Department of Biological Technology, Nanchang Normal University, Nanchang, Jiangxi, China
| | - Qinghua Nie
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, South China Agricultural University, Guangzhou, Guangdong, China
| | - Xiquan Zhang
- Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, South China Agricultural University, Guangzhou, Guangdong, China
- * E-mail:
| |
Collapse
|
27
|
Zaghloul L, Drillon G, Boulos RE, Argoul F, Thermes C, Arneodo A, Audit B. Large replication skew domains delimit GC-poor gene deserts in human. Comput Biol Chem 2014; 53 Pt A:153-65. [PMID: 25224847 DOI: 10.1016/j.compbiolchem.2014.08.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/25/2023]
Abstract
Besides their large-scale organization in isochores, mammalian genomes display megabase-sized regions, spanning both genes and intergenes, where the strand nucleotide composition asymmetry decreases linearly, possibly due to replication activity. These so-called skew-N domains cover about a third of the human genome and are bordered by two skew upward jumps that were hypothesized to compose a subset of "master" replication origins active in the germline. Skew-N domains were shown to exhibit a particular gene organization. Genes with CpG-rich promoters likely expressed in the germline are over represented near the master replication origins, with large genes being co-oriented with replication fork progression, which suggests some coordination of replication and transcription. In this study, we describe another skew structure that covers ∼13% of the human genome and that is bordered by putative master replication origins similar to the ones flanking skew-N domains. These skew-split-N domains have a shape reminiscent of a N, but split in half, leaving in the center a region of null skew whose length increases with domain size. These central regions (median size ∼860 kb) have a homogeneous composition, i.e. both a null and constant skew and a constant and low GC content. They correspond to heterochromatin gene deserts found in low-GC isochores with an average gene density of 0.81 promoters/Mb as compared to 7.73 promoters/Mb genome wide. The analysis of epigenetic marks and replication timing data confirms that, in these late replicating heterochomatic regions, the initiation of replication is likely to be random. This contrasts with the transcriptionally active euchromatin state found around the bordering well positioned master replication origins. Altogether skew-N domains and skew-split-N domains cover about 50% of the human genome.
Collapse
Affiliation(s)
- Lamia Zaghloul
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Guénola Drillon
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Rasha E Boulos
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Françoise Argoul
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Claude Thermes
- Centre de Génétique Moléculaire, CNRS UPR 3404, Gif-sur-Yvette, France
| | - Alain Arneodo
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France
| | - Benjamin Audit
- Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France.
| |
Collapse
|
28
|
GC constituents and relative codon expressed amino acid composition in cyanobacterial phycobiliproteins. Gene 2014; 546:162-71. [PMID: 24933001 DOI: 10.1016/j.gene.2014.06.024] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 04/17/2014] [Accepted: 06/12/2014] [Indexed: 02/01/2023]
Abstract
The genomic as well as structural relationship of phycobiliproteins (PBPs) in different cyanobacterial species are determined by nucleotides as well as amino acid composition. The genomic GC constituents influence the amino acid variability and codon usage of particular subunit of PBPs. We have analyzed 11 cyanobacterial species to explore the variation of amino acids and causal relationship between GC constituents and codon usage. The study at the first, second and third levels of GC content showed relatively more amino acid variability on the levels of G3+C3 position in comparison to the first and second positions. The amino acid encoded GC rich level including G rich and C rich or both correlate the codon variability and amino acid availability. The fluctuation in amino acids such as Arg, Ala, His, Asp, Gly, Leu and Glu in α and β subunits was observed at G1C1 position; however, fluctuation in other amino acids such as Ser, Thr, Cys and Trp was observed at G2C2 position. The coding selection pressure of amino acids such as Ala, Thr, Tyr, Asp, Gly, Ile, Leu, Asn, and Ser in α and β subunits of PBPs was more elaborated at G3C3 position. In this study, we observed that each subunit of PBPs is codon specific for particular amino acid. These results suggest that genomic constraint linked with GC constituents selects the codon for particular amino acids and furthermore, the codon level study may be a novel approach to explore many problems associated with genomics and proteomics of cyanobacteria.
Collapse
|
29
|
Glémin S, Clément Y, David J, Ressayre A. GC content evolution in coding regions of angiosperm genomes: a unifying hypothesis. Trends Genet 2014; 30:263-70. [PMID: 24916172 DOI: 10.1016/j.tig.2014.05.002] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 05/09/2014] [Accepted: 05/13/2014] [Indexed: 01/06/2023]
Abstract
In angiosperms (as in other species), GC content varies along and between genes, within a genome, and between genomes of different species, but the reason for this distribution is still an open question. Grass genomes are particularly intriguing because they exhibit a strong bimodal distribution of genic GC content and a sharp 5'-3' decreasing GC content gradient along most genes. Here, we propose a unifying model to explain the main patterns of GC content variation at the gene and genome scale. We argue that GC content patterns could be mainly determined by the interactions between gene structure, recombination patterns, and GC-biased gene conversion. Recent studies on fine-scale recombination maps in angiosperms support this hypothesis and previous results also fit this model. We propose that our model could be used as a null hypothesis to search for additional forces that affect GC content in angiosperms.
Collapse
Affiliation(s)
- Sylvain Glémin
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, UMR 5554 CNRS, Université Montpellier 2, F-34095 Montpellier, France.
| | - Yves Clément
- Institut des Sciences de l'Evolution de Montpellier, Unité Mixte de Recherche 5554, Centre National de la Recherche Scientifique, UMR 5554 CNRS, Université Montpellier 2, F-34095 Montpellier, France; Montpellier SupAgro, Unité Mixte de Recherche 1334 Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Jacques David
- Montpellier SupAgro, Unité Mixte de Recherche 1334 Amélioration Génétique et Adaptation des Plantes Méditerranéennes et Tropicales, F-34398 Montpellier, France
| | - Adrienne Ressayre
- INRA, UMR de Génétique Végétale, INRA/CNRS/Univ Paris-Sud/AgroParistech, Ferme du Moulon, F-91190 Gif sur Yvette, France
| |
Collapse
|
30
|
Jo YH, Patnaik BB, Kang SW, Chae SH, Oh S, Kim DH, Noh MY, Seo GW, Jeong HC, Noh JY, Jeong JE, Hwang HJ, Ko K, Han YS, Lee YS. Analysis of the genome of a Korean isolate of the Pieris rapae granulovirus enabled by its separation from total host genomic DNA by pulse-field electrophoresis. PLoS One 2013; 8:e84183. [PMID: 24391907 PMCID: PMC3877225 DOI: 10.1371/journal.pone.0084183] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 11/12/2013] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Most traditional genome sequencing projects involving viruses include the culture and purification of the virus particles. However, purification of virions may yield insufficient material for traditional sequencing. The electrophoretic method described here provides a strategy whereby the genomic DNA of the Korean isolate of Pieris rapae granulovirus (PiraGV-K) could be recovered in sufficient amounts for sequencing by purifying it directly from total host DNA by pulse-field gel electrophoresis (PFGE). METHODOLOGY/PRINCIPAL FINDINGS The total genomic DNA of infected P. rapae was embedded in agarose plugs, treated with restriction nuclease and methylase, and then PFGE was used to separate PiraGV-K DNA from the DNA of P. rapae, followed by mapping of fosmid clones of the purified viral DNA. The double-stranded circular genome of PiraGV-K was found to encode 120 open reading frames (ORFs), which covered 92% of the sequence. BLAST and ORF arrangement showed the presence of 78 homologs to other genes in the database. The mean overall amino acid identity of PiraGV-K ORFs was highest with the Chinese isolate of PiraGV (~99%), followed up with Choristoneura occidentalis ORFs at 58%. PiraGV-K ORFs were grouped, according to function, into 10 genes involved in transcription, 11 involved in replication, 25 structural protein genes, and 15 auxiliary genes. Genes for Chitinase (ORF 10) and cathepsin (ORF 11), involved in the liquefaction of the host, were found in the genome. CONCLUSIONS/SIGNIFICANCE The recovery of PiraGV-K DNA genome by pulse-field electrophoretic separation from host genomic DNA had several advantages, compared with its isolation from particles harvested as virions or inclusions from the P. rapae host. We have sequenced and analyzed the 108,658 bp PiraGV-K genome purified by the electrophoretic method. The method appears to be generally applicable to the analysis of genomes of large viruses.
Collapse
Affiliation(s)
- Yong Hun Jo
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Bharat Bhusan Patnaik
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Se Won Kang
- Department of Life Science and Biotechnology, College of Natural Sciences, Soonchunhyang University, Asan, South Korea
| | | | - Seunghan Oh
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Dong Hyun Kim
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Mi Young Noh
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Gi Won Seo
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Heon Cheon Jeong
- Hampyeong County Insect Institute, Hampyeong County Agricultural Technology Center, Hampyeong, South Korea
| | - Ju Young Noh
- Hampyeong County Insect Institute, Hampyeong County Agricultural Technology Center, Hampyeong, South Korea
| | - Ji Eun Jeong
- Department of Life Science and Biotechnology, College of Natural Sciences, Soonchunhyang University, Asan, South Korea
| | - Hee Ju Hwang
- Department of Life Science and Biotechnology, College of Natural Sciences, Soonchunhyang University, Asan, South Korea
| | - Kisung Ko
- Department of Medicine, Medical Research Institute, College of Medicine, Chung-Ang University, Seoul, South Korea
| | - Yeon Soo Han
- Division of Plant Biotechnology, College of Agriculture and Life Sciences, Chonnam National University, Gwangju, South Korea
| | - Yong Seok Lee
- Department of Life Science and Biotechnology, College of Natural Sciences, Soonchunhyang University, Asan, South Korea
| |
Collapse
|
31
|
Mutational bias plays an important role in shaping longevity-related amino acid content in mammalian mtDNA-encoded proteins. J Mol Evol 2012; 74:332-41. [PMID: 22752047 DOI: 10.1007/s00239-012-9510-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2012] [Accepted: 06/12/2012] [Indexed: 10/28/2022]
Abstract
During the course of evolution, amino acid shifts might have resulted in mitochondrial proteomes better endowed to resist oxidative stress. However, owing to the problem of distinguishing between functional constraints/adaptations in protein sequences and mutation-driven biases in the composition of these sequences, the adaptive value of such amino acid shifts remains under discussion. We have analyzed the coding sequences of mtDNA from 173 mammalian species, dissecting the effect of nucleotide composition on amino acid usages. We found remarkable cysteine avoidance in mtDNA-encoded proteins. However, no effect of longevity on cysteine content could be detected. On the other hand, nucleotide compositional shifts fully accounted for threonine usages. In spite of a strong effect of mutational bias on methionine abundances, our results suggest a role of selection in determining the composition of methionine. Whether this selective effect is linked or not to protection against oxidative stress is still a subject of debate.
Collapse
|
32
|
Berná L, Chaurasia A, Angelini C, Federico C, Saccone S, D'Onofrio G. The footprint of metabolism in the organization of mammalian genomes. BMC Genomics 2012; 13:174. [PMID: 22568857 PMCID: PMC3384468 DOI: 10.1186/1471-2164-13-174] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/08/2012] [Indexed: 01/02/2023] Open
Abstract
Background At present five evolutionary hypotheses have been proposed to explain the great variability of the genomic GC content among and within genomes: the mutational bias, the biased gene conversion, the DNA breakpoints distribution, the thermal stability and the metabolic rate. Several studies carried out on bacteria and teleostean fish pointed towards the critical role played by the environment on the metabolic rate in shaping the base composition of genomes. In mammals the debate is still open, and evidences have been produced in favor of each evolutionary hypothesis. Human genes were assigned to three large functional categories (as well as to the corresponding functional classes) according to the KOG database: (i) information storage and processing, (ii) cellular processes and signaling, and (iii) metabolism. The classification was extended to the organisms so far analyzed performing a reciprocal Blastp and selecting the best reciprocal hit. The base composition was calculated for each sequence of the whole CDS dataset. Results The GC3 level of the above functional categories was increasing from (i) to (iii). This specific compositional pattern was found, as footprint, in all mammalian genomes, but not in frog and lizard ones. Comparative analysis of human versus both frog and lizard functional categories showed that genes involved in the metabolic processes underwent the highest GC3 increment. Analyzing the KOG functional classes of genes, again a well defined intra-genomic pattern was found in all mammals. Not only genes of metabolic pathways, but also genes involved in chromatin structure and dynamics, transcription, signal transduction mechanisms and cytoskeleton, showed an average GC3 level higher than that of the whole genome. In the case of the human genome, the genes of the aforementioned functional categories showed a high probability to be associated with the chromosomal bands. Conclusions In the light of different evolutionary hypotheses proposed so far, and contributing with different potential to the genome compositional heterogeneity of mammalian genomes, the one based on the metabolic rate seems to play not a minor role. Keeping in mind similar results reported in bacteria and in teleosts, the specific compositional patterns observed in mammals highlight metabolic rate as unifying factor that fits over a wide range of living organisms.
Collapse
Affiliation(s)
- Luisa Berná
- Genome Evolution and Organization - Department Animal Physiology and Evolution, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | | | | | | | | | | |
Collapse
|
33
|
|
34
|
|
35
|
Whittle CA, Sun Y, Johannesson H. Evolution of synonymous codon usage in Neurospora tetrasperma and Neurospora discreta. Genome Biol Evol 2011; 3:332-43. [PMID: 21402862 PMCID: PMC3089379 DOI: 10.1093/gbe/evr018] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Neurospora comprises a primary model system for the study of fungal genetics and biology. In spite of this, little is known about genome evolution in Neurospora. For example, the evolution of synonymous codon usage is largely unknown in this genus. In the present investigation, we conducted a comprehensive analysis of synonymous codon usage and its relationship to gene expression and gene length (GL) in Neurospora tetrasperma and Neurospora discreta. For our analysis, we examined codon usage among 2,079 genes per organism and assessed gene expression using large-scale expressed sequenced tag (EST) data sets (279,323 and 453,559 ESTs for N. tetrasperma and N. discreta, respectively). Data on relative synonymous codon usage revealed 24 codons (and two putative codons) that are more frequently used in genes with high than with low expression and thus were defined as optimal codons. Although codon-usage bias was highly correlated with gene expression, it was independent of selectively neutral base composition (introns); thus demonstrating that translational selection drives synonymous codon usage in these genomes. We also report that GL (coding sequences [CDS]) was inversely associated with optimal codon usage at each gene expression level, with highly expressed short genes having the greatest frequency of optimal codons. Optimal codon frequency was moderately higher in N. tetrasperma than in N. discreta, which might be due to variation in selective pressures and/or mating systems.
Collapse
Affiliation(s)
- C A Whittle
- Department of Evolutionary Biology, Uppsala University, 752 36 Uppsala, Sweden
| | | | | |
Collapse
|
36
|
Wu X, Wu S, Li D, Zhang J, Hou L, Ma J, Liu W, Ren D, Zhu Y, He F. Computational identification of rare codons of Escherichia coli based on codon pairs preference. BMC Bioinformatics 2010; 11:61. [PMID: 20109184 PMCID: PMC2828438 DOI: 10.1186/1471-2105-11-61] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 01/28/2010] [Indexed: 12/04/2022] Open
Abstract
Background Codon bias is believed to play an important role in the control of gene expression. In Escherichia coli, some rare codons, which can limit the expression level of exogenous protein, have been defined by gene engineering operations. Previous studies have confirmed the existence of codon pair's preference in many genomes, but the underlying cause of this bias has not been well established. Here we focus on the patterns of rarely-used synonymous codons. A novel method was introduced to identify the rare codons merely by codon pair bias in Escherichia coli. Results In Escherichia coli, we defined the "rare codon pairs" by calculating the frequency of occurrence of all codon pairs in coding sequences. Rare codons which are disliked in genes could make great contributions to forming rare codon pairs. Meanwhile our investigation showed that many of these rare codon pairs contain termination codons and the recognized sites of restriction enzymes. Furthermore, a new index (Frare) was developed. Through comparison with the classical indices we found a significant negative correlation between Frare and the indices which depend on reference datasets. Conclusions Our approach suggests that we can identify rare codons by studying the context in which a codon lies. Also, the frequency of rare codons (Frare) could be a useful index of codon bias regardless of the lack of expression abundance information.
Collapse
Affiliation(s)
- Xianming Wu
- School of Biological Science and Technology, Shenyang Agricultural University, Shenyang, PR China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet 2009; 10:285-311. [PMID: 19630562 DOI: 10.1146/annurev-genom-082908-150001] [Citation(s) in RCA: 494] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recombination is typically thought of as a symmetrical process resulting in large-scale reciprocal genetic exchanges between homologous chromosomes. Recombination events, however, are also accompanied by short-scale, unidirectional exchanges known as gene conversion in the neighborhood of the initiating double-strand break. A large body of evidence suggests that gene conversion is GC-biased in many eukaryotes, including mammals and human. AT/GC heterozygotes produce more GC- than AT-gametes, thus conferring a population advantage to GC-alleles in high-recombining regions. This apparently unimportant feature of our molecular machinery has major evolutionary consequences. Structurally, GC-biased gene conversion explains the spatial distribution of GC-content in mammalian genomes-the so-called isochore structure. Functionally, GC-biased gene conversion promotes the segregation and fixation of deleterious AT --> GC mutations, thus increasing our genomic mutation load. Here we review the recent evidence for a GC-biased gene conversion process in mammals, and its consequences for genomic landscapes, molecular evolution, and human functional genomics.
Collapse
Affiliation(s)
- Laurent Duret
- Université de Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, F-69622, Villeurbanne, France.
| | | |
Collapse
|
38
|
Schmidt T, Frishman D. Assignment of isochores for all completely sequenced vertebrate genomes using a consensus. Genome Biol 2008; 9:R104. [PMID: 18590563 PMCID: PMC2481423 DOI: 10.1186/gb-2008-9-6-r104] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Revised: 05/22/2008] [Accepted: 06/30/2008] [Indexed: 11/16/2022] Open
Abstract
A new consensus isochore assignment method and a database of isochore maps for all completely sequenced vertebrate genomes are presented. We show that although the currently available isochore mapping methods agree on the isochore classification of about two-thirds of the human DNA, they produce significantly different results with regard to the location of isochore boundaries and isochore length distribution. We present a new consensus isochore assignment method based on majority voting and provide IsoBase, a comprehensive on-line database of isochore maps for all completely sequenced vertebrate genomes.
Collapse
Affiliation(s)
- Thorsten Schmidt
- Department of Genome-Oriented Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, D-85350 Freising, Germany
| | | |
Collapse
|
39
|
Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet 2008; 4:e1000071. [PMID: 18464896 PMCID: PMC2346554 DOI: 10.1371/journal.pgen.1000071] [Citation(s) in RCA: 258] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Accepted: 04/11/2008] [Indexed: 01/19/2023] Open
Abstract
Unraveling the evolutionary forces responsible for variations of neutral substitution patterns among taxa or along genomes is a major issue for detecting selection within sequences. Mammalian genomes show large-scale regional variations of GC-content (the isochores), but the substitution processes at the origin of this structure are poorly understood. We analyzed the pattern of neutral substitutions in 1 Gb of primate non-coding regions. We show that the GC-content toward which sequences are evolving is strongly negatively correlated to the distance to telomeres and positively correlated to the rate of crossovers (R2 = 47%). This demonstrates that recombination has a major impact on substitution patterns in human, driving the evolution of GC-content. The evolution of GC-content correlates much more strongly with male than with female crossover rate, which rules out selectionist models for the evolution of isochores. This effect of recombination is most probably a consequence of the neutral process of biased gene conversion (BGC) occurring within recombination hotspots. We show that the predictions of this model fit very well with the observed substitution patterns in the human genome. This model notably explains the positive correlation between substitution rate and recombination rate. Theoretical calculations indicate that variations in population size or density in recombination hotspots can have a very strong impact on the evolution of base composition. Furthermore, recombination hotspots can create strong substitution hotspots. This molecular drive affects both coding and non-coding regions. We therefore conclude that along with mutation, selection and drift, BGC is one of the major factors driving genome evolution. Our results also shed light on variations in the rate of crossover relative to non-crossover events, along chromosomes and according to sex, and also on the conservation of hotspot density between human and chimp. Mammalian genomes show a very strong heterogeneity of base composition along chromosomes (the so-called isochores). The functional significance of these peculiar genomic landscapes is highly debated: do isochores confer some selective advantage, or are they simply the by-product of neutral evolutionary processes? To resolve this issue, we analyzed the pattern of substitution in the human genome by comparison with chimpanzee and macaque. We show that the evolution of base composition (GC-content) is essentially determined by the rate of recombination. This effect appears to be much stronger in male than in female germline, which rules out selective explanations for the evolution of isochores. We show that this impact of recombination is most probably a consequence of the process of biased gene conversion (BGC). This neutral process mimics the action of selection and can induce strong substitution hotspots within recombination hotspots, sometimes leading to the fixation of deleterious mutations. BGC appears to be one of the major factors driving genome evolution. It is therefore essential to take this process into account if we want to be able to interpret genome sequences.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Villeurbanne, France
- * E-mail: (LD); (PFA)
| | - Peter F. Arndt
- Department for Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail: (LD); (PFA)
| |
Collapse
|
40
|
The unique genomic properties of sex-biased genes: insights from avian microarray data. BMC Genomics 2008; 9:148. [PMID: 18377635 PMCID: PMC2294128 DOI: 10.1186/1471-2164-9-148] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Accepted: 03/31/2008] [Indexed: 02/07/2023] Open
Abstract
Background In order to develop a framework for the analysis of sex-biased genes, we present a characterization of microarray data comparing male and female gene expression in 18 day chicken embryos for brain, gonad, and heart tissue. Results From the 15982 significantly expressed coding regions that have been assigned to either the autosomes or the Z chromosome (12979 in brain, 13301 in gonad, and 12372 in heart), roughly 18% were significantly sex-biased in any one tissue, though only 4 gene targets were biased in all tissues. The gonad was the most sex-biased tissue, followed by the brain. Sex-biased autosomal genes tended to be expressed at lower levels and in fewer tissues than unbiased gene targets, and autosomal somatic sex-biased genes had more expression noise than similar unbiased genes. Sex-biased genes linked to the Z-chromosome showed reduced expression in females, but not in males, when compared to unbiased Z-linked genes, and sex-biased Z-linked genes were also expressed in fewer tissues than unbiased Z coding regions. Third position GC content, and codon usage bias showed some sex-biased effects, primarily for autosomal genes expressed in the gonad. Finally, there were several over-represented Gene Ontology terms in the sex-biased gene sets. Conclusion On the whole, this analysis suggests that sex-biased genes have unique genomic and organismal properties that delineate them from genes that are expressed equally in males and females.
Collapse
|
41
|
Chen R, Yan H, Zhao KN, Martinac B, Liu GB. Comprehensive analysis of prokaryotic mechanosensation genes: their characteristics in codon usage. ACTA ACUST UNITED AC 2007; 18:269-78. [PMID: 17541832 DOI: 10.1080/10425170601136564] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In the present study, we examined GC nucleotide composition, relative synonymous codon usage (RSCU), effective number of codons (ENC), codon adaptation index (CAI) and gene length for 308 prokaryotic mechanosensitive ion channel (MSC) genes from six evolutionary groups: Euryarchaeota, Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Firmicutes, and Gammaproteobacteria. Results showed that: (1) a wide variation of overrepresentation of nucleotides exists in the MSC genes; (2) codon usage bias varies considerably among the MSC genes; (3) both nucleotide constraint and gene length play an important role in shaping codon usage of the bacterial MSC genes; and (4) synonymous codon usage of prokaryotic MSC genes is phylogenetically conserved. Knowledge of codon usage in prokaryotic MSC genes may benefit from the study of the MSC genes in eukaryotes in which few MSC genes have been identified and functionally analysed.
Collapse
Affiliation(s)
- Rong Chen
- School of Medicine, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | | | | | | | | |
Collapse
|
42
|
On the origin of synonymous codon usage divergence between thermophilic and mesophilic prokaryotes. FEBS Lett 2007; 581:5825-30. [DOI: 10.1016/j.febslet.2007.11.054] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 01/24/2023]
|
43
|
Different functional classes of genes are characterized by different compositional properties. FEBS Lett 2007; 581:5819-24. [DOI: 10.1016/j.febslet.2007.11.052] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 11/19/2022]
|
44
|
Vicario S, Moriyama EN, Powell JR. Codon usage in twelve species of Drosophila. BMC Evol Biol 2007; 7:226. [PMID: 18005411 PMCID: PMC2213667 DOI: 10.1186/1471-2148-7-226] [Citation(s) in RCA: 163] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2007] [Accepted: 11/15/2007] [Indexed: 11/25/2022] Open
Abstract
Background Codon usage bias (CUB), the uneven use of synonymous codons, is a ubiquitous observation in virtually all organisms examined. The pattern of codon usage is generally similar among closely related species, but differs significantly among distantly related organisms, e.g., bacteria, yeast, and Drosophila. Several explanations for CUB have been offered and some have been supported by observations and experiments, although a thorough understanding of the evolutionary forces (random drift, mutation bias, and selection) and their relative importance remains to be determined. The recently available complete genome DNA sequences of twelve phylogenetically defined species of Drosophila offer a hitherto unprecedented opportunity to examine these problems. We report here the patterns of codon usage in the twelve species and offer insights on possible evolutionary forces involved. Results (1) Codon usage is quite stable across 11/12 of the species: G- and especially C-ending codons are used most frequently, thus defining the preferred codons. (2) The only amino acid that changes in preferred codon is Serine with six species of the melanogaster group favoring TCC while the other species, particularly subgenus Drosophila species, favor AGC. (3) D. willistoni is an exception to these generalizations in having a shifted codon usage for seven amino acids toward A/T in the wobble position. (4) Amino acids differ in their contribution to overall CUB, Leu having the greatest and Asp the least. (5) Among two-fold degenerate amino acids, A/G ending amino acids have more selection on codon usage than T/C ending amino acids. (6) Among the different chromosome arms or elements, genes on the non-recombining element F (dot chromosome) have the least CUB, while genes on the element A (X chromosome) have the most. (7) Introns indicate that mutation bias in all species is approximately 2:1, AT:GC, the opposite of codon usage bias. (8) There is also evidence for some overall regional bias in base composition that may influence codon usage. Conclusion Overall, these results suggest that natural selection has acted on codon usage in the genus Drosophila, at least often enough to leave a footprint of selection in modern genomes. However, there is evidence in the data that random forces (drift and mutation) have also left patterns in the data, especially in genes under weak selection for codon usage for example genes in regions of low recombination. The documentation of codon usage patterns in each of these twelve genomes also aids in ongoing annotation efforts.
Collapse
Affiliation(s)
- Saverio Vicario
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT 06520-8105, USA.
| | | | | |
Collapse
|
45
|
Sabbía V, Piovani R, Naya H, Rodríguez-Maseda H, Romero H, Musto H. Trends of amino acid usage in the proteins from the human genome. J Biomol Struct Dyn 2007; 25:55-9. [PMID: 17676938 DOI: 10.1080/07391102.2007.10507155] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Correspondence analysis of amino acid usage was applied to 14,815 complete proteins from the human genome. We found that three major factors influence the variability of amino acidic composition of these proteins, explaining, respectively 20.4%, 14.7%, and 9.9% of the total variability. The first trend is strongly correlated with the GC content of first and second codon positions and is also significantly correlated with the GC level of the corresponding flanking regions and introns. Therefore, the main force shaping amino acid usage among human proteins are the compositional constraints determined by the isochore in which each gene is embedded. The second trend correlates with the hydropathy of each protein and with the frequency of beta-strands. Finally, the third trend is strongly associated with the usage of Cys and the frequency of alpha-helices.
Collapse
Affiliation(s)
- Víctor Sabbía
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Iguá 4225, Montevideo 11400, Uruguay
| | | | | | | | | | | |
Collapse
|
46
|
Abstract
The vertebrate genome is a mosaic of GC-poor and GC-rich isochores, megabase-sized DNA regions of fairly homogeneous base composition that differ in relative amount, gene density, gene expression, replication timing, and recombination frequency. At the emergence of warm-blooded vertebrates, the gene-rich, moderately GC-rich isochores of the cold-blooded ancestors underwent a GC increase. This increase was similar in mammals and birds and was maintained during the evolution of mammalian and avian orders. Neither the GC increase nor its conservation can be accounted for by the random fixation of neutral or nearly neutral single-nucleotide changes (i.e., the vast majority of nucleotide substitutions) or by a biased gene conversion process occurring at random genome locations. Both phenomena can be explained, however, by the neoselectionist theory of genome evolution that is presented here. This theory fully accepts Ohta's nearly neutral view of point mutations but proposes in addition (i) that the AT-biased mutational input present in vertebrates pushes some DNA regions below a certain GC threshold; (ii) that these lower GC levels cause regional changes in chromatin structure that lead to deleterious effects on replication and transcription; and (iii) that the carriers of these changes undergo negative (purifying) selection, the final result being a compositional conservation of the original isochore pattern in the surviving population. Negative selection may also largely explain the GC increase accompanying the emergence of warm-blooded vertebrates. In conclusion, the neoselectionist theory not only provides a solution to the neutralist/selectionist debate but also introduces an epigenomic component in genome evolution.
Collapse
Affiliation(s)
- Giorgio Bernardi
- Molecular Evolution Laboratory, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy.
| |
Collapse
|
47
|
Melodelima C, Gautier C, Piau D. A markovian approach for the prediction of mouse isochores. J Math Biol 2007; 55:353-64. [PMID: 17486342 DOI: 10.1007/s00285-007-0087-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2006] [Revised: 03/01/2007] [Indexed: 10/23/2022]
Abstract
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now.
Collapse
Affiliation(s)
- Christelle Melodelima
- UMR 5558 CNRS Biométrie et Biologie Evolutive, Université Claude Bernard Lyon 1, 43 boulevard du 11 Novembre 1818, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
48
|
Guo X, Bao J, Fan L. Evidence of selectively driven codon usage in rice: implications for GC content evolution of Gramineae genes. FEBS Lett 2007; 581:1015-21. [PMID: 17306258 DOI: 10.1016/j.febslet.2007.01.088] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2006] [Revised: 01/27/2007] [Accepted: 01/31/2007] [Indexed: 10/23/2022]
Abstract
Two gene classes characterized by high and low GC content have been found in rice and other cereals, but not dicot genomes. We used paralogs with high and low GC contents in rice and found: (a) a greater increase in GC content at exonic fourfold-redundant sites than at flanking introns; (b) with reference to their orthologs in Arabidopsis, most substitution sites between the two kinds of paralogs are found at 2- and 4-degenerate sites with a T-->C mode, while A-->C and A-->G play major roles at 0-degenerate sites; and (c) high-GC genes have greater bias and codon usage is skewed toward codons that are preferred in highly expressed genes. We believe this is strong evidence for selectively driven codon usage in rice. Another cereal, maize, also showed the same trend as in rice. This represents a potential evolutionary process for the origin of genes with a high GC content in rice and other cereals.
Collapse
Affiliation(s)
- Xingyi Guo
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310029, China
| | | | | |
Collapse
|
49
|
Tripathy S, Tyler BM. The repertoire of transfer RNA genes is tuned to codon usage bias in the genomes of Phytophthora sojae and Phytophthora ramorum. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2006; 19:1322-8. [PMID: 17153916 DOI: 10.1094/mpmi-19-1322] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
In all, 238 and 155 transfer (t)RNA genes were predicted from the genomes of Phytophthora sojae and P. ramorum, respectively. After omitting pseudogenes and undetermined types of tRNA genes, there remained 208 P. sojae tRNA genes and 140 P. ramorum tRNA genes. There were 45 types of tRNA genes, with distinct anticodons, in each species. Fourteen common anticodon types of tRNAs are missing altogether from the genome in the two species; however, these appear to be compensated by wobbling of other tRNA anticodons in a manner which is tied to the codon bias in Phytophthora genes. The most abundant tRNA class was arginine in both P. sojae and P. ramorum. A codon usage table was generated for these two organisms from a total of 9,803,525 codons in P. sojae and 7,496,598 codons in P. ramorum. The most abundant codon type detected from the codon usage tables was GAG (encoding glutamic acid), whereas the most numerous tRNA gene had a methionine anticodon (CAT). The correlation between the frequencies of tRNA genes and the codon frequencies in protein-coding genes was very low (0.12 in P. sojae and 0.19 in P. ramorum); however, the correlation between amino acid tRNA gene frequency and the corresponding amino acid codon frequency in P. sojae and P. ramorum was substantially higher (0.53 in P. sojae and 0.77 in P. ramorum). The codon usage frequencies of P. sojae and P ramorum were very strongly correlated (0.99), as were tRNA gene frequencies (0.77). Approximately 60% of orthologous tRNA gene pairs in P sojae and P. ramorum are located in regions that have conserved synteny in the two species.
Collapse
Affiliation(s)
- Sucheta Tripathy
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA.
| | | |
Collapse
|
50
|
Cutter AD, Wasmuth JD, Blaxter ML. The evolution of biased codon and amino acid usage in nematode genomes. Mol Biol Evol 2006; 23:2303-15. [PMID: 16936139 DOI: 10.1093/molbev/msl097] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Despite the degeneracy of the genetic code, whereby different codons encode the same amino acid, alternative codons and amino acids are utilized nonrandomly within and between genomes. Such biases in codon and amino acid usage have been demonstrated extensively in prokaryote genomes and likely reflect a balance between the action of mutation, selection, and genetic drift. Here, we quantify the effects of selection and mutation drift as causes of codon and amino acid-usage bias in a large collection of nematode partial genomes from 37 species spanning approximately 700 Myr of evolution, as inferred from expressed sequence tag (EST) measures of gene expression and from base composition variation. Average G + C content at silent sites among these taxa ranges from 10% to 63%, and EST counts range more than 100-fold, underlying marked differences between the identities of major codons and optimal codons for a given species as well as influencing patterns of amino acid abundance among taxa. Few species in our sample demonstrate a dominant role of selection in shaping intragenomic codon-usage biases, and these are principally free living rather than parasitic nematodes. This suggests that deviations in effective population size among species, with small effective sizes among parasites, are partly responsible for species differences in the extent to which selection shapes patterns of codon usage. Nevertheless, a consensus set of optimal codons emerges that is common to most taxa, indicating that, with some notable exceptions, selection for translational efficiency and accuracy favors similar sets of codons regardless of the major codon-usage trends defined by base compositional properties of individual nematode genomes.
Collapse
Affiliation(s)
- Asher D Cutter
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom.
| | | | | |
Collapse
|