1
|
Lado S, Thannesberger J, Spettel K, Arpović J, Ferreira BI, Lavitrano M, Steininger C. Unveiling Inter- and Intra-Patient Sequence Variability with a Multi-Sample Coronavirus Target Enrichment Approach. Viruses 2024; 16:786. [PMID: 38793667 PMCID: PMC11125942 DOI: 10.3390/v16050786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/08/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
Amid the global challenges posed by the COVID-19 pandemic, unraveling the genomic intricacies of SARS-CoV-2 became crucial. This study explores viral evolution using an innovative high-throughput next-generation sequencing (NGS) approach. By taking advantage of nasal swab and mouthwash samples from patients who tested positive for COVID-19 across different geographical regions during sequential infection waves, our study applied a targeted enrichment protocol and pooling strategy to increase detection sensitivity. The approach was extremely efficient, yielding a large number of reads and mutations distributed across 10 distinct viral gene regions. Notably, the genes Envelope, Nucleocapsid, and Open Reading Frame 8 had the highest number of unique mutations per 1000 nucleotides, with both spike and Nucleocapsid genes showing evidence for positive selection. Focusing on the spike protein gene, crucial in virus replication and immunogenicity, our findings show a dynamic SARS-CoV-2 evolution, emphasizing the virus-host interplay. Moreover, the pooling strategy facilitated subtle sequence variability detection. Our findings painted a dynamic portrait of SARS-CoV-2 evolution, emphasizing the intricate interplay between the virus and its host populations and accentuating the importance of continuous genomic surveillance to understand viral dynamics. As SARS-CoV-2 continues to evolve, this approach proves to be a powerful, versatile, fast, and cost-efficient screening tool for unraveling emerging variants, fostering understanding of the virus's genetic landscape.
Collapse
Affiliation(s)
- Sara Lado
- Division of Infectious Diseases and Tropical Medicine, Department of Medicine 1, Medical University of Vienna, 1090 Vienna, Austria; (S.L.); (J.T.)
| | - Jakob Thannesberger
- Division of Infectious Diseases and Tropical Medicine, Department of Medicine 1, Medical University of Vienna, 1090 Vienna, Austria; (S.L.); (J.T.)
| | - Kathrin Spettel
- Division of Clinical Microbiology, Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria;
- Division of Biomedical Science, University of Applied Sciences, FH Campus Wien, 1100 Vienna, Austria
| | - Jurica Arpović
- Department of Medical Biology, School of Medicine, University of Mostar, Bijeli Brijeg b.b., 88000 Mostar, Bosnia and Herzegovina
| | - Bibiana I. Ferreira
- Faculty of Medicine and Biomedical Sciences, University of Algarve, Campus de Gambelas, Edf. 2, 8005-139 Faro, Portugal;
- Algarve Biomedical Center Research Institute, Campus de Gambelas, Edf. 2, lab 3.67, 8005-139 Faro, Portugal
| | | | - Christoph Steininger
- Division of Infectious Diseases and Tropical Medicine, Department of Medicine 1, Medical University of Vienna, 1090 Vienna, Austria; (S.L.); (J.T.)
- Karl-Landsteiner Institute for Microbiome Research, Medical University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
2
|
Wu X, Shan K, Zan F, Tang X, Qian Z, Lu J. Optimization and Deoptimization of Codons in SARS-CoV-2 and Related Implications for Vaccine Development. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2205445. [PMID: 37267926 PMCID: PMC10427376 DOI: 10.1002/advs.202205445] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 04/08/2023] [Indexed: 06/04/2023]
Abstract
The spread of coronavirus disease 2019 (COVID-19), caused by severe respiratory syndrome coronavirus 2 (SARS-CoV-2), has progressed into a global pandemic. To date, thousands of genetic variants have been identified among SARS-CoV-2 isolates collected from patients. Sequence analysis reveals that the codon adaptation index (CAI) values of viral sequences have decreased over time but with occasional fluctuations. Through evolution modeling, it is found that this phenomenon may result from the virus's mutation preference during transmission. Using dual-luciferase assays, it is further discovered that the deoptimization of codons in the viral sequence may weaken protein expression during virus evolution, indicating that codon usage may play an important role in virus fitness. Finally, given the importance of codon usage in protein expression and particularly for mRNA vaccines, it is designed several codon-optimized Omicron BA.2.12.1, BA.4/5, and XBB.1.5 spike mRNA vaccine candidates and experimentally validated their high levels of expression. This study highlights the importance of codon usage in virus evolution and provides guidelines for codon optimization in mRNA and DNA vaccine development.
Collapse
Affiliation(s)
- Xinkai Wu
- State Key Laboratory of Protein and Plant Gene ResearchCenter for BioinformaticsSchool of Life SciencesPeking UniversityBeijing100871China
| | - Ke‐jia Shan
- State Key Laboratory of Protein and Plant Gene ResearchCenter for BioinformaticsSchool of Life SciencesPeking UniversityBeijing100871China
| | - Fuwen Zan
- NHC Key Laboratory of Systems Biology of PathogensInstitute of Pathogen BiologyChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100176China
| | - Xiaolu Tang
- State Key Laboratory of Protein and Plant Gene ResearchCenter for BioinformaticsSchool of Life SciencesPeking UniversityBeijing100871China
| | - Zhaohui Qian
- NHC Key Laboratory of Systems Biology of PathogensInstitute of Pathogen BiologyChinese Academy of Medical Sciences and Peking Union Medical CollegeBeijing100176China
| | - Jian Lu
- State Key Laboratory of Protein and Plant Gene ResearchCenter for BioinformaticsSchool of Life SciencesPeking UniversityBeijing100871China
| |
Collapse
|
3
|
Orgera J, Kelley JJ, Bar O, Vaidhyanathan S, Grigoriev A. SARSNTdb database: Factors affecting SARS-CoV-2 sequence conservation. FRONTIERS IN VIROLOGY 2022. [DOI: 10.3389/fviro.2022.1028335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
SARSNTdb offers a curated, nucleotide-centric database for users of varying levels of SARS-CoV-2 knowledge. Its user-friendly interface enables querying coding regions and coordinate intervals to find out the various functional and selective constraints that act upon the corresponding nucleotides and amino acids. Users can easily obtain information about viral genes and proteins, functional domains, repeats, secondary structure formation, intragenomic interactions, and mutation prevalence. Currently, many databases are focused on the phylogeny and amino acid substitutions, mainly in the spike protein. We took a novel, more nucleotide-focused approach as RNA does more than just code for proteins and many insights can be gleaned from its study. For example, RNA-targeted drug therapies for SARS-CoV-2 are currently being developed and it is essential to understand the features only visible at that level. This database enables the user to identify regions that are more prone to forming secondary structures that drugs can target. SARSNTdb also provides illustrative mutation data from a subset of ~25,000 patient samples with a reliable read coverage across the whole genome (from different locations and time points in the pandemic. Finally, the database allows for comparing SARS-CoV-2 and SARS-CoV domains and sequences. SARSNTdb can serve the research community by being a curated repository for information that gives a jump start to analyze a mutation’s effect far beyond just determining synonymous/non-synonymous substitutions in protein sequences.
Collapse
|
4
|
The Long-Term Evolutionary History of Gradual Reduction of CpG Dinucleotides in the SARS-CoV-2 Lineage. BIOLOGY 2021; 10:biology10010052. [PMID: 33445785 PMCID: PMC7828247 DOI: 10.3390/biology10010052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/29/2020] [Accepted: 01/09/2021] [Indexed: 12/24/2022]
Abstract
Simple Summary Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the coronavirus disease 2019 (COVID-19), a pandemic that infected over 81 million people worldwide. This has led the scientific community to characterize the genome of this virus, including its nucleotide composition. Investigation of the dinucleotide frequency revealed that the proportion of CG dinucleotides (CpG) is highly reduced in the viral genomes. Since CpG dinucleotides is the target site for the host antiviral zinc finger protein, it has been suggested that the reduction in the proportion of CpG is the viral response to escape from the host defense machinery. In the present study, we investigated the time of origin of reduction in the CpG content. Whole genome analyses based on all representative viral genomes of the group Betacoronavirus revealed that the CpG content in the lineage of SARS-CoV-2 has been progressively declining over the past 1213 years. The depletion of CpG was found to occur at neutral—as well as selectively constrained—positions of the viral genomes. Abstract Recent studies suggested that the fraction of CG dinucleotides (CpG) is severely reduced in the genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The CpG deficiency was predicted to be the adaptive response of the virus to evade degradation of the viral RNA by the antiviral zinc finger protein that specifically binds to CpG nucleotides. By comparing all representative genomes belonging to the genus Betacoronavirus, this study examined the potential time of origin of CpG depletion. The results of this investigation revealed a highly significant correlation between the proportions of CpG nucleotide (CpG content) of the betacoronavirus species and their times of divergence from SARS-CoV-2. Species that are distantly related to SARS-CoV-2 had much higher CpG contents than that of SARS-CoV-2. Conversely, closely related species had low CpG contents that are similar to or slightly higher than that of SARS-CoV-2. These results suggest a systematic and continuous reduction in the CpG content in the SARS-CoV-2 lineage that might have started since the Sarbecovirus + Hibecovirus clade separated from Nobecovirus, which was estimated to be 1213 years ago. This depletion was not found to be mediated by the GC contents of the genomes. Our results also showed that the depletion of CpG occurred at neutral positions of the genome as well as those under selection. The latter is evident from the progressive reduction in the proportion of arginine amino acid (coded by CpG dinucleotides) in the SARS-CoV-2 lineage over time. The results of this study suggest that shedding CpG nucleotides from their genome is a continuing process in this viral lineage, potentially to escape from their host defense mechanisms.
Collapse
|
5
|
van Dorp L, Richard D, Tan CCS, Shaw LP, Acman M, Balloux F. No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2. Nat Commun 2020; 11:5986. [PMID: 33239633 PMCID: PMC7688939 DOI: 10.1038/s41467-020-19818-2] [Citation(s) in RCA: 186] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 10/30/2020] [Indexed: 02/07/2023] Open
Abstract
COVID-19 is caused by the coronavirus SARS-CoV-2, which jumped into the human population in late 2019 from a currently uncharacterised animal reservoir. Due to this recent association with humans, SARS-CoV-2 may not yet be fully adapted to its human host. This has led to speculations that SARS-CoV-2 may be evolving towards higher transmissibility. The most plausible mutations under putative natural selection are those which have emerged repeatedly and independently (homoplasies). Here, we formally test whether any homoplasies observed in SARS-CoV-2 to date are significantly associated with increased viral transmission. To do so, we develop a phylogenetic index to quantify the relative number of descendants in sister clades with and without a specific allele. We apply this index to a curated set of recurrent mutations identified within a dataset of 46,723 SARS-CoV-2 genomes isolated from patients worldwide. We do not identify a single recurrent mutation in this set convincingly associated with increased viral transmission. Instead, recurrent mutations currently in circulation appear to be evolutionary neutral and primarily induced by the human immune system via RNA editing, rather than being signatures of adaptation. At this stage we find no evidence for significantly more transmissible lineages of SARS-CoV-2 due to recurrent mutations.
Collapse
Affiliation(s)
- Lucy van Dorp
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK.
| | - Damien Richard
- Cirad, UMR PVBMT, F-97410 St Pierre, Réunion, France
- Université de la Réunion, UMR PVBMT, F-97490 St Denis, Réunion, France
| | - Cedric C S Tan
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK
| | - Liam P Shaw
- Nuffield Department of Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DU, UK
| | - Mislav Acman
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK
| | - François Balloux
- UCL Genetics Institute, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
6
|
Klimczak LJ, Randall TA, Saini N, Li JL, Gordenin DA. Similarity between mutation spectra in hypermutated genomes of rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic. PLoS One 2020; 15:e0237689. [PMID: 33006981 PMCID: PMC7531822 DOI: 10.1371/journal.pone.0237689] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 09/21/2020] [Indexed: 12/16/2022] Open
Abstract
Genomes of tens of thousands of SARS-CoV2 isolates have been sequenced across the world and the total number of changes (predominantly single base substitutions) in these isolates exceeds ten thousand. We compared the mutational spectrum in the new SARS-CoV-2 mutation dataset with the previously published mutation spectrum in hypermutated genomes of rubella-another positive single stranded (ss) RNA virus. Each of the rubella virus isolates arose by accumulation of hundreds of mutations during propagation in a single subject, while SARS-CoV-2 mutation spectrum represents a collection events in multiple virus isolates from individuals across the world. We found a clear similarity between the spectra of single base substitutions in rubella and in SARS-CoV-2, with C to U as well as A to G and U to C being the most prominent in plus strand genomic RNA of each virus. Of those, U to C changes universally showed preference for loops versus stems in predicted RNA secondary structure. Similarly, to what was previously reported for rubella virus, C to U changes showed enrichment in the uCn motif, which suggested a subclass of APOBEC cytidine deaminase being a source of these substitutions. We also found enrichment of several other trinucleotide-centered mutation motifs only in SARS-CoV-2-likely indicative of a mutation process characteristic to this virus. Altogether, the results of this analysis suggest that the mutation mechanisms that lead to hypermutation of the rubella vaccine virus in a rare pathological condition may also operate in the background of the SARS-CoV-2 viruses currently propagating in the human population.
Collapse
Affiliation(s)
- Leszek J. Klimczak
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, NIH, Durham, North Carolina, United State of America
| | - Thomas A. Randall
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, NIH, Durham, North Carolina, United State of America
| | - Natalie Saini
- Mechanisms of Genome Dynamics Group, National Institute of Environmental Health Sciences, NIH, Durham, North Carolina, United State of America
| | - Jian-Liang Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, NIH, Durham, North Carolina, United State of America
| | - Dmitry A. Gordenin
- Mechanisms of Genome Dynamics Group, National Institute of Environmental Health Sciences, NIH, Durham, North Carolina, United State of America
- * E-mail:
| |
Collapse
|
7
|
Klimczak LJ, Randall TA, Saini N, Li JL, Gordenin DA. Similarity between mutation spectra in hypermutated genomes of rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.08.03.234005. [PMID: 32793907 PMCID: PMC7418721 DOI: 10.1101/2020.08.03.234005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Genomes of tens of thousands of SARS-CoV2 isolates have been sequenced across the world and the total number of changes (predominantly single base substitutions) in these isolates exceeds ten thousand. We compared the mutational spectrum in the new SARS-CoV-2 mutation dataset with the previously published mutation spectrum in hypermutated genomes of rubella - another positive single stranded (ss) RNA virus. Each of the rubella isolates arose by accumulation of hundreds of mutations during propagation in a single subject, while SARS-CoV-2 mutation spectrum represents a collection events in multiple virus isolates from individuals across the world. We found a clear similarity between the spectra of single base substitutions in rubella and in SARS-CoV-2, with C to U as well as A to G and U to C being the most prominent in plus strand genomic RNA of each virus. Of those, U to C changes universally showed preference for loops versus stems in predicted RNA secondary structure. Similarly, to what was previously reported for rubella, C to U changes showed enrichment in the uCn motif, which suggested a subclass of APOBEC cytidine deaminase being a source of these substitutions. We also found enrichment of several other trinucleotide-centered mutation motifs only in SARS-CoV-2 - likely indicative of a mutation process characteristic to this virus. Altogether, the results of this analysis suggest that the mutation mechanisms that lead to hypermutation of the rubella vaccine virus in a rare pathological condition may also operate in the background of the SARS-CoV-2 viruses currently propagating in the human population.
Collapse
|
8
|
Berkhout B, van Hemert F. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Res 2015; 202:41-7. [PMID: 25656063 PMCID: PMC7114406 DOI: 10.1016/j.virusres.2014.11.031] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Revised: 11/11/2014] [Accepted: 11/12/2014] [Indexed: 11/17/2022]
Abstract
The nucleotide composition of a coronaviral RNA genome is biased (high U, low C). This bias is a relatively stable property along the viral genome, but less prominent in the last 1/3 of the genome. This bias is even more pronounced in the single-stranded, unpaired RNA domains. The bias dictates the atypical codon usage of the coronaviruses. The RNA genome of the zoonotic viruses MERS and SARS is extremely biased.
We investigated the nucleotide composition of the RNA genome of the six human coronaviruses. Some general coronavirus characteristics were apparent (e.g. high U, low C count), but we also detected species-specific signatures. Most strikingly, the high U and low C proportions are quite variable and act like communicating vessels, C goes down when U goes up and vice versa. U ranges among virus isolates from 30.7% to 40.3%, and C makes the opposite movement from 20.0% to 12.9%, respectively. The nucleotide biases are more pronounced in the unpaired regions of the structured RNA genome, which may suggest a certain biological function for these distinctive sequence signatures. Coronaviruses have an atypical codon usage that has been linked to mutational events operating on the viral RNA genome on an evolutionary time scale. We suggest that the atypical nucleotide bias may serve a distinct biological function and that it is the direct cause of the characteristic codon usage in these viruses. The relevance for evolution of the novel human pathogens MERS and SARS is discussed.
Collapse
Affiliation(s)
- Ben Berkhout
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, The Netherlands.
| | - Formijn van Hemert
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, The Netherlands
| |
Collapse
|
9
|
Liu L, Li D, Bai F. A relative Lempel-Ziv complexity: Application to comparing biological sequences. Chem Phys Lett 2012; 530:107-112. [PMID: 32226089 PMCID: PMC7094452 DOI: 10.1016/j.cplett.2012.01.061] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2011] [Accepted: 01/24/2012] [Indexed: 11/17/2022]
Abstract
One of the main tasks in biological sequence analysis is biological sequence comparison. Numerous efficient methods have been developed for sequence comparison. Traditional sequence comparison is based on sequence alignment. In this report, we propose a novel alignment-free method based on the relative Lempel-Ziv complexity to compare biological sequences. The vertebrate transferring genomes and the spike protein sequences are prepared and tested to evaluate the validity of the method. We use this method to build phylogenetic tree of two groups of the sequences. The result demonstrates that our method is powerful and efficient.
Collapse
Affiliation(s)
- Liwei Liu
- College of Science, Dalian Jiaotong University, Dalian 116028, PR China
| | - Dongbo Li
- Department of Otolaryngology, Affiliated Xinhua Hospital of Dalian University, Dalian 116021, PR China
| | - Fenglan Bai
- College of Science, Dalian Jiaotong University, Dalian 116028, PR China
| |
Collapse
|
10
|
Dai Q, Liu X, Li L, Yao Y, Han B, Zhu L. Using Gaussian model to improve biological sequence comparison. J Comput Chem 2010; 31:351-61. [PMID: 19479732 PMCID: PMC7166749 DOI: 10.1002/jcc.21322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2009] [Accepted: 04/14/2009] [Indexed: 11/08/2022]
Abstract
One of the major tasks in biological sequence analysis is to compare biological sequences, which could serve as evidence of structural and functional conservation, as well as of evolutionary relations among the sequences. Numerous efficient methods have been developed for sequence comparison, but challenges remain. In this article, we proposed a novel method to compare biological sequences based on Gaussian model. Instead of comparing the frequencies of k-words in biological sequences directly, we considered the k-word frequency distribution under Gaussian model which gives the different expression levels of k-words. The proposed method was tested by similarity search, evaluation on functionally related genes, and phylogenetic analysis. The performance of our method was further compared with alignment-based and alignment-free methods. The results demonstrate that Gaussian model provides more information about k-word frequencies and improves the efficiency of sequence comparison.
Collapse
Affiliation(s)
- Qi Dai
- Institute for Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- School of Science, Hangzhou Dianzi University; Hangzhou 310018, People's Republic of China
| | - Lihua Li
- Institute for Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci‐Tech University, Hangzhou 310018, People's Republic of China
| | - Bin Han
- Institute for Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Lei Zhu
- Institute for Biomedical Engineering and Instrumentation, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| |
Collapse
|
11
|
Woo PC, Wong BH, Huang Y, Lau SK, Yuen KY. Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses. Virology 2007; 369:431-42. [PMID: 17881030 PMCID: PMC7103290 DOI: 10.1016/j.virol.2007.08.010] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2007] [Revised: 08/02/2007] [Accepted: 08/07/2007] [Indexed: 12/01/2022]
Abstract
Using the complete genome sequences of 19 coronavirus genomes, we analyzed the codon usage bias, dinucleotide relative abundance and cytosine deamination in coronavirus genomes. Of the eight codons that contain CpG, six were markedly suppressed. The mean NNU/NNC ratio of the six amino acids using either NNC or NNU as codon is 3.262, suggesting cytosine deamination. Among the 16 dinucleotides, CpG was most markedly suppressed (mean relative abundance 0.509). No correlation was observed between CpG abundance and mean NNU/NNC ratio. Among the 19 coronaviruses, CoV-HKU1 showed the most extreme codon usage bias and extremely high NNU/NNC ratio of 8.835. Cytosine deamination and selection of CpG suppressed clones by the immune system are the two major independent biochemical and biological selective forces that shape codon usage bias in coronavirus genomes. The underlying mechanism for the extreme codon usage bias, cytosine deamination and G + C content in CoV-HKU1 warrants further studies.
Collapse
Affiliation(s)
- Patrick C.Y. Woo
- State Key Laboratory of Emerging Infectious Diseases, Hong Kong
- Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong
- Department of Microbiology, The University of Hong Kong, Hong Kong
| | | | - Yi Huang
- Department of Microbiology, The University of Hong Kong, Hong Kong
| | - Susanna K.P. Lau
- State Key Laboratory of Emerging Infectious Diseases, Hong Kong
- Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong
- Department of Microbiology, The University of Hong Kong, Hong Kong
| | - Kwok-Yung Yuen
- State Key Laboratory of Emerging Infectious Diseases, Hong Kong
- Research Centre of Infection and Immunology, The University of Hong Kong, Hong Kong
- Department of Microbiology, The University of Hong Kong, Hong Kong
- Corresponding author. State Key Laboratory of Emerging Infectious Diseases, Department of Microbiology, The University of Hong Kong, Room 423, University Pathology Building, Queen Mary Hospital Compound, Pokfulam, Hong Kong. Fax: +852 2855 1241.
| |
Collapse
|
12
|
Abstract
We consider construction of a characteristic distribution of an L-tuple in a DNA sequence. The mathematical characteristic of the characteristic distribution is selected as invariant to characterize the L-tuple. With the invariant, we can perform the sequence comparison. The graphs of characteristic distributions of dinucleotide GC for the coding sequences of the first exon of beta-globin gene of eleven different species and the construction of phylogenetic tree of twenty four coronavirus genomes illustrate the utility of the approach.
Collapse
Affiliation(s)
- Ying-Zhao Liu
- Department of Applied Mathematics, Dalian University of Technology, Dalian, Liaoning 116024, P. R. China.
| | | | | |
Collapse
|
13
|
GraphDNA: a Java program for graphical display of DNA composition analyses. BMC Bioinformatics 2007; 8:21. [PMID: 17244370 PMCID: PMC1783863 DOI: 10.1186/1471-2105-8-21] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Accepted: 01/23/2007] [Indexed: 11/10/2022] Open
Abstract
Background Under conditions of no strand bias the number of Gs is equal to that of Cs for each DNA strand; similarly, the total number of Ts is equal to that of As. However, within each strand there are considerable local deviations from the A = T and G = C equality. These asymmetries in nucleotide composition have been extensively analyzed in prokaryotic and eukaryotic genomes and related to chromosome organization, transcription orientation and other processes in certain organisms. To carry out analysis of intra-strand nucleotide distribution several graphical methods have been developed. Results GraphDNA is a new Java application that provides a simple, user-friendly interface for the visualization of DNA nucleotide composition. The program accepts GenBank, EMBL and FASTA files as an input, and it displays multiple DNA nucleotide composition graphs (skews and walks) in a single window to allow direct comparisons between the sequences. We illustrate the use of DNA skews for characterization of poxvirus and coronavirus genomes. Conclusion GraphDNA is a platform-independent, Open Source, tool for the analysis of nucleotide trends in DNA sequences. Multiple sequence formats can be read and multiple sequences may be plotted in a single results window.
Collapse
|
14
|
Zheng WX, Chen LL, Ou HY, Gao F, Zhang CT. Coronavirus phylogeny based on a geometric approach. Mol Phylogenet Evol 2005; 36:224-32. [PMID: 15890535 PMCID: PMC7111192 DOI: 10.1016/j.ympev.2005.03.030] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2004] [Revised: 01/12/2005] [Accepted: 03/28/2005] [Indexed: 11/29/2022]
Abstract
A novel coronavirus has been identified as the cause of the outbreak of severe acute respiratory syndrome (SARS). Previous phylogenetic analyses based on sequence alignments show that SARS-CoVs form a new group distantly related to the other three groups of previously characterized coronaviruses. In this paper, a geometric approach based on the Z-curve representation of the whole genome sequence is proposed to analyze the phylogenetic relationships of coronaviruses. The evolutionary distances are obtained through measuring the differences among the three-dimensional Z-curves. The Z-curve is approximately described by its geometric center and the associated three eigenvectors, which indicate the center position and the trend of the Z-curve, respectively. Although some information is lost due to the approximate description of the Z-curve, the phylogenetic tree constructed based on these parameters is consistent with those of previous analyses. The present method has the merits of simplicity and intuitiveness, but it is still in its premature stage. Because the phylogenetic relationships are inferred from the whole genome, instead of some individual genes, the present method represents a new direction of phylogeny study in the post-genome era.
Collapse
Affiliation(s)
- Wen-Xin Zheng
- Department of Physics, Tianjin University, Tianjin 300072, China
| | | | | | | | | |
Collapse
|
15
|
Inberg A, Linial M. Evolutional insights on uncharacterized SARS coronavirus genes. FEBS Lett 2005; 577:159-64. [PMID: 15527778 PMCID: PMC7125658 DOI: 10.1016/j.febslet.2004.09.076] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 10/29/2022]
Abstract
The complete genome of the severe acute respiratory syndrome coronavirus (SARS-CoV) and many of its variants has been determined by several laboratories. The genome contains fourteen predicted open reading frames (ORFs). However, a function had been clearly assigned for only six of these ORFs, in the viral replication, transcription and structural constituents. The others are herein referred to as uncharacterized ORFs (UC-ORFs). Here, we try to provide a relational insight on those UC-ORFs, suggesting that a number of them are remotely related to structural proteins of coronaviruses and other viruses infecting mammalian hosts. Surprisingly, several of the UC-ORFs exhibit considerable similarity with other SARS-CoV ORFs. These observations may provide clues on the evolution and genome dynamics of the SARS-CoV.
Collapse
Affiliation(s)
- Alex Inberg
- Dept of Biological Chemistry, Life Science Institute, The Hebrew University, Jerusalem, 91904 Israel
| | - Michal Linial
- Dept of Biological Chemistry, Life Science Institute, The Hebrew University, Jerusalem, 91904 Israel
| |
Collapse
|
16
|
Pyrc K, Jebbink MF, Berkhout B, van der Hoek L. Genome structure and transcriptional regulation of human coronavirus NL63. Virol J 2004; 1:7. [PMID: 15548333 PMCID: PMC538260 DOI: 10.1186/1743-422x-1-7] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Accepted: 11/17/2004] [Indexed: 11/23/2022] Open
Abstract
Background Two human coronaviruses are known since the 1960s: HCoV-229E and HCoV-OC43. SARS-CoV was discovered in the early spring of 2003, followed by the identification of HCoV-NL63, the fourth member of the coronaviridae family that infects humans. In this study, we describe the genome structure and the transcription strategy of HCoV-NL63 by experimental analysis of the viral subgenomic mRNAs. Results The genome of HCoV-NL63 has the following gene order: 1a-1b-S-ORF3-E-M-N. The GC content of the HCoV-NL63 genome is extremely low (34%) compared to other coronaviruses, and we therefore performed additional analysis of the nucleotide composition. Overall, the RNA genome is very low in C and high in U, and this is also reflected in the codon usage. Inspection of the nucleotide composition along the genome indicates that the C-count increases significantly in the last one-third of the genome at the expense of U and G. We document the production of subgenomic (sg) mRNAs coding for the S, ORF3, E, M and N proteins. We did not detect any additional sg mRNA. Furthermore, we sequenced the 5' end of all sg mRNAs, confirming the presence of an identical leader sequence in each sg mRNA. Northern blot analysis indicated that the expression level among the sg mRNAs differs significantly, with the sg mRNA encoding nucleocapsid (N) being the most abundant. Conclusions The presented data give insight into the viral evolution and mutational patterns in coronaviral genome. Furthermore our data show that HCoV-NL63 employs the discontinuous replication strategy with generation of subgenomic mRNAs during the (-) strand synthesis. Because HCoV-NL63 has a low pathogenicity and is able to grow easily in cell culture, this virus can be a powerful tool to study SARS coronavirus pathogenesis.
Collapse
Affiliation(s)
- Krzysztof Pyrc
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| | - Maarten F Jebbink
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| | - Ben Berkhout
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| | - Lia van der Hoek
- Department of Human Retrovirology, University of Amsterdam, Meibergdreef 15, 1105 AZ, Amsterdam, The Netherlands
| |
Collapse
|