1
|
Oliver JL, Bernaola-Galván P, Carpena P, Perfectti F, Gómez-Martín C, Castiglione S, Raia P, Verdú M, Moya A. Strong evidence for the evolution of decreasing compositional heterogeneity in SARS-CoV-2 genomes during the pandemic. Sci Rep 2025; 15:12246. [PMID: 40210974 PMCID: PMC11985940 DOI: 10.1038/s41598-025-95893-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 03/25/2025] [Indexed: 04/12/2025] Open
Abstract
The rapid evolution of SARS-CoV-2 during the pandemic was characterized by the fixation of a plethora of mutations, many of which enable the virus to evade host resistance, likely altering the virus' genome compositional structure (i.e., the arrangement of compositional domains of varying lengths and nucleotide frequencies within the genome). To explore this hypothesis, we summarize the evolutionary effects of these mutations by computing the Sequence Compositional Complexity (SCC) in random stratified datasets of fully sequenced genomes. Phylogenetic ridge regression of SCC against time reveals a striking downward evolutionary trend, suggesting the ongoing adaptation of the virus's genome structure to the human host. Other genomic features, such as strand asymmetry, the effective number of K-mers, and the depletion of CpG dinucleotides, each linked to the virus's adaptation to its human host, also exhibit decreasing phylogenetic trends throughout the pandemic, along with strong phylogenetic correlations to SCC. We hypothesize that viral CpG depletion (throughout C➔U changes), promoted by directional mutational pressures exerted on the genome by the host antiviral defense systems, may play a key role in the decrease of SARS-CoV-2 genome compositional heterogeneity, with specific adaptation to the human host occurring as a form of genetic mimicry. Overall, our findings suggest a decelerating evolution of reduced compositional complexity in SCC, whereas the number of K-mers and the depletion of CpG dinucleotides are still increasing. These results indicate a genome-wide evolutionary trend toward a more symmetric and homogeneous genome compositional structure in SARS-CoV-2, which is partly still ongoing.
Collapse
Affiliation(s)
- José L Oliver
- Department of Genetics, Faculty of Sciences, University of Granada, 18071, Granada, Spain.
- Laboratory of Bioinformatics, Institute of Biotechnology, Center of Biomedical Research, 18100, Granada, Spain.
| | - Pedro Bernaola-Galván
- Department of Applied Physics II and Institute Carlos I for Theoretical and Computational Physics, University of Málaga, Málaga, 29071, Spain
| | - Pedro Carpena
- Department of Applied Physics II and Institute Carlos I for Theoretical and Computational Physics, University of Málaga, Málaga, 29071, Spain
| | - Francisco Perfectti
- Department of Genetics, Faculty of Sciences, University of Granada, 18071, Granada, Spain
- Research Unit Modeling Nature, Universidad de Granada, Granada, 18071, Spain
| | - Cristina Gómez-Martín
- Department of Genetics, Faculty of Sciences, University of Granada, 18071, Granada, Spain
- Laboratory of Bioinformatics, Institute of Biotechnology, Center of Biomedical Research, 18100, Granada, Spain
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, Netherlands
| | - Silvia Castiglione
- Dipartimento di Scienze della Terra, dell'Ambiente e delle Risorse, Università di Napoli Federico II, Napoli, 80126, Italy
| | - Pasquale Raia
- Dipartimento di Scienze della Terra, dell'Ambiente e delle Risorse, Università di Napoli Federico II, Napoli, 80126, Italy
| | - Miguel Verdú
- Centro de Investigaciones sobre Desertificación, Consejo Superior de Investigaciones Científicas (CSIC), University of València and Generalitat Valenciana, 46113, Valencia, Spain
| | - Andrés Moya
- Institute of Integrative Systems Biology (I2sysbio), University of València and Consejo Superior de Investigaciones Científicas (CSIC), 46980, Valencia, Spain.
- Foundation for the Promotion of Sanitary and Biomedical Research of Valencian Community (FISABIO), 46020, Valencia, Spain.
- CIBER in Epidemiology and Public Health, Madrid, 28029, Spain.
| |
Collapse
|
2
|
Guharay S. A data-driven approach to study temporal characteristics of COVID-19 infection and death Time Series for twelve countries across six continents. BMC Med Res Methodol 2025; 25:1. [PMID: 39754044 PMCID: PMC11697903 DOI: 10.1186/s12874-024-02423-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 11/26/2024] [Indexed: 01/07/2025] Open
Abstract
BACKGROUND In this work, we implement a data-driven approach using an aggregation of several analytical methods to study the characteristics of COVID-19 daily infection and death time series and identify correlations and characteristic trends that can be corroborated to the time evolution of this disease. The datasets cover twelve distinct countries across six continents, from January 22, 2020 till March 1, 2022. This time span is partitioned into three windows: (1) pre-vaccine, (2) post-vaccine and pre-omicron (BA.1 variant), and (3) post-vaccine including post-omicron variant. This study enables deriving insights into intriguing questions related to the science of system dynamics pertaining to COVID-19 evolution. METHODS We implement a set of several distinct analytical methods for: (a) statistical studies to estimate the skewness and kurtosis of the data distributions; (b) analyzing the stationarity properties of these time series using the Augmented Dickey-Fuller (ADF) tests; (c) examining co-integration properties for the non-stationary time series using the Phillips-Ouliaris (PO) tests; (d) calculating the Hurst exponent using the rescaled-range (R/S) analysis, along with the Detrended Fluctuation Analysis (DFA), for self-affinity studies of the evolving dynamical datasets. RESULTS We notably observe a significant asymmetry of distributions shows from skewness and the presence of heavy tails is noted from kurtosis. The daily infection and death data are, by and large, nonstationary, while their corresponding log return values render stationarity. The self-affinity studies through the Hurst exponents and DFA exhibit intriguing local changes over time. These changes can be attributed to the underlying dynamics of state transitions, especially from a random state to either mean-reversion or long-range memory/persistence states. CONCLUSIONS We conduct systematic studies covering a widely diverse time series datasets of the daily infections and deaths during the evolution of the COVID-19 pandemic. We demonstrate the merit of a multiple analytics frameworks through systematically laying down a methodological structure for analyses and quantitatively examining the evolution of the daily COVID-19 infection and death cases. This methodology builds a capability for tracking dynamically evolving states pertaining to critical problems.
Collapse
Affiliation(s)
- Sabyasachi Guharay
- Systems Engineering & Operations Research, George Mason University, Fairfax, VA, 22030, USA.
| |
Collapse
|
3
|
Correia J, de Lima M, Silva R, Anselmo D, Vasconcelos M, Viswanathan G. Multifractal analysis of coronavirus sequences. CHAOS, SOLITONS & FRACTALS 2023; 174:113843. [DOI: 10.1016/j.chaos.2023.113843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
4
|
Xie XH, Huang YJ, Han GS, Yu ZG, Ma YL. Microbial characterization based on multifractal analysis of metagenomes. Front Cell Infect Microbiol 2023; 13:1117421. [PMID: 36779183 PMCID: PMC9910082 DOI: 10.3389/fcimb.2023.1117421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/09/2023] [Indexed: 01/28/2023] Open
Abstract
Introduction The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research. Method and Results Firstly, we visualized the chaotic game representation (CGR) of simulated metagenomes and real metagenomes. We find that metagenomes are visualized with self-similarity. Then we defined and calculated the multifractal dimension for the visualized plot of simulated and real metagenomes, respectively. By analyzing the Pearson correlation coefficients between the multifractal dimension and the traditional species diversity index, we obtain that the correlation coefficients between the multifractal dimension and the species richness index and Shannon diversity index reached the maximum value when q = 0, 1, and the correlation coefficient between the multifractal dimension and the Simpson diversity index reached the maximum value when q = 5. Finally, we apply our method to real metagenomes of the gut microbiota of 100 infants who are newborn and 4 and 12 months old. The results show that the multifractal dimensions of an infant's gut microbiomes can distinguish age differences. Conclusion and Discussion There is self-similarity among the CGRs of WGS of metagenomes, and the multifractal spectrum is an important characteristic for metagenomes. The traditional diversity indicators can be unified under the framework of multifractal analysis. These results coincided with similar results in macrobial ecology. The multifractal spectrum of infants' gut microbiomes are related to the development of the infants.
Collapse
Affiliation(s)
- Xian-hua Xie
- Key Laboratory of Jiangxi Province for Numerical Simulation and Emulation Techniques, Gannan Normal University, Ganzhoiu, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
- *Correspondence: Xian-hua Xie,
| | - Yu-jie Huang
- Key Laboratory of Jiangxi Province for Numerical Simulation and Emulation Techniques, Gannan Normal University, Ganzhoiu, China
| | - Guo-sheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Zu-guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Yuan-lin Ma
- School of Economics, Zhengzhou University of Aeronautics, Zhengzhou, China
| |
Collapse
|
5
|
Ahmad SU, Hafeez Kiani B, Abrar M, Jan Z, Zafar I, Ali Y, Alanazi AM, Malik A, Rather MA, Ahmad A, Khan AA. A comprehensive genomic study, mutation screening, phylogenetic and statistical analysis of SARS-CoV-2 and its variant omicron among different countries. J Infect Public Health 2022; 15:878-891. [PMID: 35839568 PMCID: PMC9262654 DOI: 10.1016/j.jiph.2022.07.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/16/2022] [Accepted: 07/03/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND With the rapid development of the genomic sequence data for the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its variants Delta (B.1.617.2) and Omicron (B.1.1.529), it is vital to successfully identify mutations within the genome. OBJECTIVE The main objective of the study is to investigate the full-length genome mutation analysis of 157 SARS-CoV-2 and its variant Delta and Omicron isolates. This study also provides possible effects at the structural level to understand the role of mutations and new insights into the evolution of COVID-19 and evaluates the differential level analysis in viral genome sequence among different nations. We have also tried to offer a mutation snapshot for these differences that could help in vaccine formulation. This study utilizes a unique and efficient method of targeting the stable genes for the drug discovery approach. METHODS Complete genome sequence information of SARS-CoV-2, Delta, and Omicron from online resources were used to predict structure domain identification, data mining, and screening; employing different bioinformatics tools. BioEdit software was used to perform their genomic alignments across countries and a phylogenetic tree as per the confidence of 500 bootstrapping values was constructed. Heterozygosity ratios were determined in-silico. A minimum spanning network (MSN) of selected populations was determined by Bruvo's distance role-based framework. RESULTS Out of all 157 different strains of SARS-CoV-2 and its variants, and their complete genome sequences from different countries, Corona nucleoca and DUF5515 were observed to be the most conserved domains. All genomes obtained changes in comparison to the Wuhan-Hu-1 strain, mainly in the TRS region (CUAAAC or ACGAAC). We discovered 596 mutations in all genes, with the highest number (321) found in ORF1ab (QHD43415.1), or TRS site mutations found only in ORF7a (1) and ORF10 (2). The Omicron variant has 30 mutations in the Spike protein and has a higher alpha-helix shape (23.46%) than the Delta version (22.03%). T478 was also discovered to be a prevalent polymorphism in Delta and Omicron variations, as well as genomic gaps ranging from 45 to 65aa. All 157 sequences contained variations and conformed to Nei's Genetic distance. We discovered heterozygosity (Hs) 0.01, mean anticipated Hs 0.32, the genetic diversity index (GDI) 0.01943989, and GD within population 0.01266951. The Hedrick value was 0.52324978, the GD coefficient was 0.52324978, the average Hs was 0.01371452, and the GD coefficient was 0.52324978. Among other countries, Brazil has the highest standard error (SE) rate (1.398), whereas Japan has the highest ratio of Nei's gene diversity (0.01). CONCLUSIONS The study's findings will assist in comprehending the shape and kind of complete genome, their streaming genomic sequences, and mutations in various additions of SARS-CoV-2, as well as its different variant strains like Omicron. These results will provide a scientific basis to design the vaccines and understand the genomic study of these viruses.
Collapse
Affiliation(s)
- Syed Umair Ahmad
- Department of Bioinformatics, Hazara University, Mansehra, Pakistan
| | - Bushra Hafeez Kiani
- Department of Biological Sciences, Faculty of Basic and Applied Sciences, International Islamic University Islamabad, 44000, Pakistan
| | - Muhammad Abrar
- Department of Anesthesia, DHQ Teaching Hospital, Sahiwal Medical College, Sahiwal, Pakistan
| | - Zainab Jan
- Department of Bioinformatics, Hazara University, Mansehra, Pakistan
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University, Pakistan
| | - Yasir Ali
- National Centre for Bioinformatics, Quaid-i-Azam University, Islamabad, Pakistan
| | - Amer M. Alanazi
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh 11451, Saudi Arabia
| | - Abdul Malik
- Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh 11451, Saudi Arabia
| | - Mohd Ashraf Rather
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e, Kashmir University of Agricultural Science and Technology, Kashmir, India
| | - Asrar Ahmad
- Center for Sickle Cell Disease, College of Medicine, Howard University, Washington DC, USA
| | - Azmat Ali Khan
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh 11451, Saudi Arabia,Corresponding author
| |
Collapse
|