1
|
Sangeet S, Sinha A, Nair MB, Mahata A, Sarkar R, Roy S. EVOLVE: A Web Platform for AI-Based Protein Mutation Prediction and Evolutionary Phase Exploration. J Chem Inf Model 2025; 65:4293-4310. [PMID: 40309917 DOI: 10.1021/acs.jcim.5c00026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2025]
Abstract
While predicting structure-function relationships from sequence data is fundamental in biophysical chemistry, identifying prospective single-point and collective mutation sites in proteins can help us stay ahead in understanding their potential effects on protein structure and function. Addressing the challenges of large sequence-space analysis, we present EVOLVE, a web tool enabling researchers to explore prospective mutation sites and their collective behavior. EVOLVE integrates a statistical mechanics-guided machine learning algorithms to predict probable mutational sites, with statistical mechanics calculating mutational entropy to accurately identify mutational hotspots. Validation against a number of viral protein sequences confirms its ability to predict mutations and their functional consequences. By leveraging statistical mechanics of phase transition concept, EVOLVE also quantifies mutational entropy fluctuations, offering a quantitative foundation for identifying Variants of Concern (VOC) or Variants under Monitoring (VUM) as per World Health Organization (WHO) guidelines. EVOLVE streamlines data upload and analysis with a user-friendly interface and comprehensive tutorials. Access EVOLVE free at https://evolve-iiserkol.com.
Collapse
Affiliation(s)
- Satyam Sangeet
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Kolkata, West Bengal 741246, India
- School of Physics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Anushree Sinha
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Kolkata, West Bengal 741246, India
| | - Madhav B Nair
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Kolkata, West Bengal 741246, India
| | - Arpita Mahata
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Kolkata, West Bengal 741246, India
| | - Raju Sarkar
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Kolkata, West Bengal 741246, India
| | - Susmita Roy
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Kolkata, West Bengal 741246, India
| |
Collapse
|
2
|
Nawaz MS, Nawaz MZ, Gong Y, Fournier-Viger P, Diallo AB. In silico framework for genome analysis. FUTURE GENERATION COMPUTER SYSTEMS 2025; 164:107585. [DOI: 10.1016/j.future.2024.107585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
3
|
Santoni D. An entropy-based study on the mutational landscape of SARS-CoV-2 in USA: Comparing different variants and revealing co-mutational behavior of proteins. Gene 2024; 922:148556. [PMID: 38754568 DOI: 10.1016/j.gene.2024.148556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
COVID-19 emergency has pushed the international scientific community to use every resource to combat the spread of the virus, to understand its biology and predict its possible evolution in terms of new variants. Since the first SARS-CoV-2 virus nucleotide and amino acid sequences were made available, information theory was used to study how viral information content was changing over time and then trace the evolution of its mutational landscape. In this work we analyzed SARS-CoV-2 sequences collected mainly in the USA in a period from March 2020 until December 2022 and computed mutation profiles of viral proteins over time through an entropy-based approach using Shannon Entropy and Hellinger distance. This representation allows an at-a-glance view of the mutational landscape of viral proteins over time and can provide new insights on the evolution of the virus from different points of view. Non-structural proteins typically showed flat mutation profiles, characterized by a very low Average mutation Entropy, while accessory and structural proteins showed mostly non uniform and high mutation profiles, often coupled with the predominance of variants. Interestingly NSP2 protein, whose function is currently still debated, falls in the same branch of NSP14 and NSP10 in the phylogenetic tree of mutations constructed through correlations of mutation profiles, suggesting a co-evolution of those proteins and a possible functional link with each other. To the best of our knowledge this is the first study based on a massive amount of data (n = 107,939,973) that analyzes from an entropy point of view the mutational landscape of SARS-CoV-2 over time and depicts a mutational temporal profile of each protein of the virus.
Collapse
Affiliation(s)
- Daniele Santoni
- Institute for System Analysis and Computer Science "Antonio Ruberti", National Research Council of Italy, Via dei Taurini 19, Rome 00185, Italy.
| |
Collapse
|
4
|
Formentin M, Chignola R, Favretti M. Optimal entropic properties of SARS-CoV-2 RNA sequences. ROYAL SOCIETY OPEN SCIENCE 2024; 11:231369. [PMID: 38298394 PMCID: PMC10827432 DOI: 10.1098/rsos.231369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 01/02/2024] [Indexed: 02/02/2024]
Abstract
The reaction of the scientific community against the COVID-19 pandemic has generated a huge (approx. 106 entries) dataset of genome sequences collected worldwide and spanning a relatively short time window. These unprecedented conditions together with the certain identification of the reference viral genome sequence allow for an original statistical study of mutations in the virus genome. In this paper, we compute the Shannon entropy of every sequence in the dataset as well as the relative entropy and the mutual information between the reference sequence and the mutated ones. These functions, originally developed in information theory, measure the information content of a sequence and allows us to study the random character of mutation mechanism in terms of its entropy and information gain or loss. We show that this approach allows us to set in new format known features of the SARS-CoV-2 mutation mechanism like the CT bias, but also to discover new optimal entropic properties of the mutation process in the sense that the virus mutation mechanism track closely theoretically computable lower bounds for the entropy decrease and the information transfer.
Collapse
Affiliation(s)
- Marco Formentin
- Department of Mathematics Tullio Levi-Civita, University of Padova, via Trieste 63 35131 Padova, Italy
| | - Roberto Chignola
- Department of Biotechnology, University of Verona, Strada le Grazie 15-CV1, 37134 Verona, Italy
| | - Marco Favretti
- Department of Mathematics Tullio Levi-Civita, University of Padova, via Trieste 63 35131 Padova, Italy
| |
Collapse
|
5
|
Ashraf J, Bukhari SARS, Kanji A, Iqbal T, Yameen M, Nisar MI, Khan W, Hasan Z. Substitution spectra of SARS-CoV-2 genome from Pakistan reveals insights into the evolution of variants across the pandemic. Sci Rep 2023; 13:20955. [PMID: 38017265 PMCID: PMC10684861 DOI: 10.1038/s41598-023-48272-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/24/2023] [Indexed: 11/30/2023] Open
Abstract
Changing morbidity and mortality due to COVID-19 across the pandemic has been linked with factors such as the emergence of SARS-CoV-2 variants and vaccination. Mutations in the Spike glycoprotein enhanced viral transmission and virulence. We investigated whether SARS-CoV-2 mutation rates and entropy were associated COVID-19 in Pakistan, before and after the introduction of vaccinations. We analyzed 1,705 SARS-CoV-2 genomes using the Augur phylogenetic pipeline. Substitution rates and entropy across the genome, and in the Spike glycoprotein were compared between 2020, 2021 and 2022 (as periods A, B and C). Mortality was greatest in B whilst cases were highest during C. In period A, G clades were predominant, and substitution rate was 5.25 × 10-4 per site per year. In B, Delta variants dominated, and substitution rates increased to 9.74 × 10-4. In C, Omicron variants led to substitution rates of 5.02 × 10-4. Genome-wide entropy was the highest during B particularly, at Spike E484K and K417N. During C, genome-wide mutations increased whilst entropy was reduced. Enhanced SARS-CoV-2 genome substitution rates were associated with a period when more virulent SARS-CoV-2 variants were prevalent. Reduced substitution rates and stabilization of genome entropy was subsequently evident when vaccinations were introduced. Whole genome entropy analysis can help predict virus evolution to guide public health interventions.
Collapse
Affiliation(s)
- Javaria Ashraf
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, P.O. Box 3500, Karachi, 74800, Pakistan
| | - Sayed Ali Raza Shah Bukhari
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, P.O. Box 3500, Karachi, 74800, Pakistan
| | - Akbar Kanji
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, P.O. Box 3500, Karachi, 74800, Pakistan
| | - Tulaib Iqbal
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, P.O. Box 3500, Karachi, 74800, Pakistan
| | - Maliha Yameen
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, P.O. Box 3500, Karachi, 74800, Pakistan
| | - Muhammad Imran Nisar
- Department of Pediatrics and Child Health, Aga Khan University, Karachi, Pakistan
- Department of Pediatrics and Child Health, CITRIC Center for Bioinformatics and Computational Biology, Aga Khan University, Karachi, Pakistan
| | - Waqasuddin Khan
- Department of Pediatrics and Child Health, Aga Khan University, Karachi, Pakistan
- Department of Pediatrics and Child Health, CITRIC Center for Bioinformatics and Computational Biology, Aga Khan University, Karachi, Pakistan
| | - Zahra Hasan
- Department of Pathology and Laboratory Medicine, Aga Khan University, Stadium Road, P.O. Box 3500, Karachi, 74800, Pakistan.
| |
Collapse
|
6
|
Jaewjaroenwattana J, Phoolcharoen W, Pasomsub E, Teengam P, Chailapakul O. Electrochemical paper-based antigen sensing platform using plant-derived monoclonal antibody for detecting SARS-CoV-2. Talanta 2023; 251:123783. [PMID: 35977451 PMCID: PMC9357285 DOI: 10.1016/j.talanta.2022.123783] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/20/2022] [Accepted: 07/24/2022] [Indexed: 11/09/2022]
Abstract
The current approaches of diagnostic platforms for detecting SARS-CoV-2 infections mostly relied on adapting the existing technology. In this work, a simple and low-cost electrochemical sensing platform for detecting SAR-CoV-2 antigen was established. The proposed sensor combined the innovative disposable paper-based immunosensor and cost-effective plant-based anti-SARS-CoV-2 monoclonal antibody CR3022, expressed in Nicotiana benthamiana. The cellulose nanocrystal was modified on screen-printed graphene electrode to provide the abundant COOH functional groups on electrode surface, leading to the high ability for antibody immobilization. The quantification of the presence receptor binding domain (RBD) spike protein of SARS-CoV-2 was performed using differential pulse voltammetry by monitoring the changing current of [Fe(CN)6]3-/4- redox solution. The current change of [Fe(CN)6]3-/4- before and after the presence of target RBD could be clearly distinguished, providing a linear relationship with RBD concentration in the range from 0.1 pg/mL to 500 ng/mL with the minimum limit of detection of 2.0 fg/mL. The proposed platform was successfully applied to detect RBD in nasopharyngeal swab samples with satisfactory results. Furthermore, the paper-based immunosensor was extended to quantify the RBD level in spiked saliva samples, demonstrating the broadly applicability of this system. This electrochemical paper-based immunosensor has the potential to be employed as a point-of-care testing for COVID-19 diagnosis.
Collapse
Affiliation(s)
- Jutamas Jaewjaroenwattana
- Electrochemistry and Optical Spectroscopy Center of Excellence, Department of Chemistry, Faculty of Science, Chulalongkorn University, Pathumwan, Bangkok, 10330, Thailand
| | - Waranyoo Phoolcharoen
- Department of Pharmacognosy and Pharmaceutical Botany, Faculty of Pharmaceutical Sciences, Center of Excellence in Plant-produced Pharmaceuticals, Chulalongkorn University, Pathumwan, Bangkok, 10330, Thailand
| | - Ekawat Pasomsub
- Division of Virology, Department of Pathology, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Prinjaporn Teengam
- Electrochemistry and Optical Spectroscopy Center of Excellence, Department of Chemistry, Faculty of Science, Chulalongkorn University, Pathumwan, Bangkok, 10330, Thailand.
| | - Orawon Chailapakul
- Electrochemistry and Optical Spectroscopy Center of Excellence, Department of Chemistry, Faculty of Science, Chulalongkorn University, Pathumwan, Bangkok, 10330, Thailand.
| |
Collapse
|
7
|
Muñoz-Chimeno M, Rodriguez-Paredes V, García-Lugo MA, Avellon A. Hepatitis E genotype 3 genome: A comprehensive analysis of entropy, motif conservation, relevant mutations, and clade-associated polymorphisms. Front Microbiol 2022; 13:1011662. [PMID: 36274715 PMCID: PMC9582770 DOI: 10.3389/fmicb.2022.1011662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Hepatitis E virus genotype 3 (HEV-3) is an EU/EEA emergent zoonosis. HEV-3 clades/subtypes have been described. Its genome contains ORF1, which encodes nonstructural proteins for virus replication, ORF2, the capsid protein, and ORF3, a multifunctional protein involved in virion pathogenesis. The study aims with respect to HEV-3 are to: (1) calculate genome entropy (excluding hypervariable region); (2) analyze the described motifs/mutations; (3) characterize clade/subtype genome polymorphisms. Seven hundred and five sequences from the GenBank database were used. The highest entropies were identified in zoonotic genotypes (HEV-3 and HEV-4) with respect to HEV-1 in X domain, RdRp, ORF2, and ORF3. There were statistically significant differences in the entropy between proteins, protease and ORF3 being the most variable and Y domain being the most conserved. Methyltransferase and Y domain motifs were completely conserved. By contrast, essential protease H581 residue and catalytic dyad exhibited amino acid changes in 1.8% and 0.4% of sequences, respectively. Several X domain amino acids were associated with clades. We found sequences with mutations in all helicase motifs except number IV. Helicase mutations related to increased virulence and/or fulminant hepatitis were frequent, the 1,110 residue being a typical HEV-3e and HEV-3f-A2 polymorphism. RdRp motifs III, V, VII also had high mutation rates. Motif III included residues that are polymorphisms of HEV-3e (F1449) and HEV-3 m (D1451). RdRp ribavirin resistance mutations were frequent, mainly 1479I (67.4, 100% in HEV-3efglmk) and 1634R/K (10.0%, almost 100% in HEV-3e). With respect to ORF2, 19/27 neutralization epitopes had mutations. The S80 residue in ORF3 presented mutations in 3.5% of cases. Amino acids in the ORF3-PSAP motif had high substitution rates, being more frequent in the first PSAP (44.8%) than in the second (1.5%). This is the first comprehensive analysis of the HEV-3 genome, aimed at improving our knowledge of the genome, and establishing the basis for future genotype-to-phenotype analysis, given that viral features associated with severity have not been explored in depth. Our results demonstrate there are important genetic differences in the studied genomes that sometimes affect significant viral structures, and constitute clade/subtype polymorphisms that may affect the clinical course or treatment efficacy.
Collapse
Affiliation(s)
- Milagros Muñoz-Chimeno
- Hepatitis Unit, National Center of Microbiology, Carlos III Institute of Health, Madrid, Spain
- Alcalá de Henares University, Madrid, Spain
| | | | | | - Ana Avellon
- Hepatitis Unit, National Center of Microbiology, Carlos III Institute of Health, Madrid, Spain
- CIBERESP Epidemiology and Public Health, Madrid, Spain
- *Correspondence: Ana Avellon,
| |
Collapse
|