1
|
Ose NJ, Campitelli P, Modi T, Kazan IC, Kumar S, Ozkan SB. Some mechanistic underpinnings of molecular adaptations of SARS-COV-2 spike protein by integrating candidate adaptive polymorphisms with protein dynamics. eLife 2024; 12:RP92063. [PMID: 38713502 PMCID: PMC11076047 DOI: 10.7554/elife.92063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024] Open
Abstract
We integrate evolutionary predictions based on the neutral theory of molecular evolution with protein dynamics to generate mechanistic insight into the molecular adaptations of the SARS-COV-2 spike (S) protein. With this approach, we first identified candidate adaptive polymorphisms (CAPs) of the SARS-CoV-2 S protein and assessed the impact of these CAPs through dynamics analysis. Not only have we found that CAPs frequently overlap with well-known functional sites, but also, using several different dynamics-based metrics, we reveal the critical allosteric interplay between SARS-CoV-2 CAPs and the S protein binding sites with the human ACE2 (hACE2) protein. CAPs interact far differently with the hACE2 binding site residues in the open conformation of the S protein compared to the closed form. In particular, the CAP sites control the dynamics of binding residues in the open state, suggesting an allosteric control of hACE2 binding. We also explored the characteristic mutations of different SARS-CoV-2 strains to find dynamic hallmarks and potential effects of future mutations. Our analyses reveal that Delta strain-specific variants have non-additive (i.e., epistatic) interactions with CAP sites, whereas the less pathogenic Omicron strains have mostly additive mutations. Finally, our dynamics-based analysis suggests that the novel mutations observed in the Omicron strain epistatically interact with the CAP sites to help escape antibody binding.
Collapse
Affiliation(s)
- Nicholas James Ose
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - Tushar Modi
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - I Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple UniversityPhiladelphiaUnited States
- Department of Biology, Temple UniversityPhiladelphiaUnited States
- Center for Genomic Medicine Research, King Abdulaziz UniversityJeddahSaudi Arabia
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State UniversityTempeUnited States
| |
Collapse
|
2
|
Mohebbi F, Zelikovsky A, Mangul S, Chowell G, Skums P. Early detection of emerging viral variants through analysis of community structure of coordinated substitution networks. Nat Commun 2024; 15:2838. [PMID: 38565543 PMCID: PMC10987511 DOI: 10.1038/s41467-024-47304-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
The emergence of viral variants with altered phenotypes is a public health challenge underscoring the need for advanced evolutionary forecasting methods. Given extensive epistatic interactions within viral genomes and known viral evolutionary history, efficient genomic surveillance necessitates early detection of emerging viral haplotypes rather than commonly targeted single mutations. Haplotype inference, however, is a significantly more challenging problem precluding the use of traditional approaches. Here, using SARS-CoV-2 evolutionary dynamics as a case study, we show that emerging haplotypes with altered transmissibility can be linked to dense communities in coordinated substitution networks, which become discernible significantly earlier than the haplotypes become prevalent. From these insights, we develop a computational framework for inference of viral variants and validate it by successful early detection of known SARS-CoV-2 strains. Our methodology offers greater scalability than phylogenetic lineage tracing and can be applied to any rapidly evolving pathogen with adequate genomic surveillance data.
Collapse
Affiliation(s)
- Fatemeh Mohebbi
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Serghei Mangul
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA, USA
| | - Gerardo Chowell
- School of Public Health, Georgia State University, Atlanta, GA, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.
- School of Computing, College of Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
3
|
Lv JX, Liu X, Pei YY, Song ZG, Chen X, Hu SJ, She JL, Liu Y, Chen YM, Zhang YZ. Evolutionary trajectory of diverse SARS-CoV-2 variants at the beginning of COVID-19 outbreak. Virus Evol 2024; 10:veae020. [PMID: 38562953 PMCID: PMC10984623 DOI: 10.1093/ve/veae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 01/24/2024] [Accepted: 02/29/2024] [Indexed: 04/04/2024] Open
Abstract
Despite extensive scientific efforts directed toward the evolutionary trajectory of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in humans at the beginning of the COVID-19 epidemic, it remains unclear how the virus jumped into and evolved in humans so far. Herein, we recruited almost all adult coronavirus disease 2019 (COVID-19) cases appeared locally or imported from abroad during the first 8 months of the outbreak in Shanghai. From these patients, SARS-CoV-2 genomes occupying the important phylogenetic positions in the virus phylogeny were recovered. Phylogenetic and mutational landscape analyses of viral genomes recovered here and those collected in and outside of China revealed that all known SARS-CoV-2 variants exhibited the evolutionary continuity despite the co-circulation of multiple lineages during the early period of the epidemic. Various mutations have driven the rapid SARS-CoV-2 diversification, and some of them favor its better adaptation and circulation in humans, which may have determined the waxing and waning of various lineages.
Collapse
Affiliation(s)
- Jia-Xin Lv
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Xiang Liu
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Yuan-Yuan Pei
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
- Shanghai Public Health Clinical Center, No. 2901 Canglang Road, Jinshan District, Shanghai 210508, China
| | - Zhi-Gang Song
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
- Shanghai Public Health Clinical Center, No. 2901 Canglang Road, Jinshan District, Shanghai 210508, China
| | - Xiao Chen
- College of Marine Sciences, South China Agricultural University, No. 483 Wushan Road, Tianhe District, Guangzhou, Guangdong 510642, China
| | - Shu-Jian Hu
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Jia-Lei She
- Shanghai Public Health Clinical Center, No. 2901 Canglang Road, Jinshan District, Shanghai 210508, China
| | - Yi Liu
- Shanghai Public Health Clinical Center, No. 2901 Canglang Road, Jinshan District, Shanghai 210508, China
| | - Yan-Mei Chen
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| | - Yong-Zhen Zhang
- State Key Laboratory of Genetic Engineering, Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences and Human Phenome Institute, Fudan University, No. 2005 Songhu Road, Yangpu District, Shanghai 200438, China
| |
Collapse
|
4
|
Alvarez S, Nartey CM, Mercado N, de la Paz JA, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. Proc Natl Acad Sci U S A 2024; 121:e2308895121. [PMID: 38285950 PMCID: PMC10861889 DOI: 10.1073/pnas.2308895121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/19/2023] [Indexed: 01/31/2024] Open
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called sequence evolution with epistatic contributions (SEEC). Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo [Formula: see text]-lactamase activity in Escherichia coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their wild-type predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes, and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | | | - Tea Huseinbegovic
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
5
|
Ose NJ, Campitelli P, Modi T, Can Kazan I, Kumar S, Banu Ozkan S. Some mechanistic underpinnings of molecular adaptations of SARS-COV-2 spike protein by integrating candidate adaptive polymorphisms with protein dynamics. bioRxiv 2024:2023.09.14.557827. [PMID: 37745560 PMCID: PMC10515954 DOI: 10.1101/2023.09.14.557827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
We integrate evolutionary predictions based on the neutral theory of molecular evolution with protein dynamics to generate mechanistic insight into the molecular adaptations of the SARS-COV-2 Spike (S) protein. With this approach, we first identified Candidate Adaptive Polymorphisms (CAPs) of the SARS-CoV-2 Spike protein and assessed the impact of these CAPs through dynamics analysis. Not only have we found that CAPs frequently overlap with well-known functional sites, but also, using several different dynamics-based metrics, we reveal the critical allosteric interplay between SARS-CoV-2 CAPs and the S protein binding sites with the human ACE2 (hACE2) protein. CAPs interact far differently with the hACE2 binding site residues in the open conformation of the S protein compared to the closed form. In particular, the CAP sites control the dynamics of binding residues in the open state, suggesting an allosteric control of hACE2 binding. We also explored the characteristic mutations of different SARS-CoV-2 strains to find dynamic hallmarks and potential effects of future mutations. Our analyses reveal that Delta strain-specific variants have non-additive (i.e., epistatic) interactions with CAP sites, whereas the less pathogenic Omicron strains have mostly additive mutations. Finally, our dynamics-based analysis suggests that the novel mutations observed in the Omicron strain epistatically interact with the CAP sites to help escape antibody binding.
Collapse
Affiliation(s)
- Nicholas J. Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Tushar Modi
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
6
|
Hou M, Shi J, Gong Z, Wen H, Lan Y, Deng X, Fan Q, Li J, Jiang M, Tang X, Wu CI, Li F, Ruan Y. Intra- vs. Interhost Evolution of SARS-CoV-2 Driven by Uncorrelated Selection-The Evolution Thwarted. Mol Biol Evol 2023; 40:msad204. [PMID: 37707487 PMCID: PMC10521905 DOI: 10.1093/molbev/msad204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023] Open
Abstract
In viral evolution, a new mutation has to proliferate within the host (Stage I) in order to be transmitted and then compete in the host population (Stage II). We now analyze the intrahost single nucleotide variants (iSNVs) in a set of 79 SARS-CoV-2 infected patients with most transmissions tracked. Here, every mutation has two measures: 1) iSNV frequency within each individual host in Stage I; 2) occurrence among individuals ranging from 1 (private), 2-78 (public), to 79 (global) occurrences in Stage II. In Stage I, a small fraction of nonsynonymous iSNVs are sufficiently advantageous to rise to a high frequency, often 100%. However, such iSNVs usually fail to become public mutations. Thus, the selective forces in the two stages of evolution are uncorrelated and, possibly, antagonistic. For that reason, successful mutants, including many variants of concern, have to avoid being eliminated in Stage I when they first emerge. As a result, they may not have the transmission advantage to outcompete the dominant strains and, hence, are rare in the host population. Few of them could manage to slowly accumulate advantageous mutations to compete in Stage II. When they do, they would appear suddenly as in each of the six successive waves of SARS-CoV-2 strains. In conclusion, Stage I evolution, the gate-keeper, may contravene the long-term viral evolution and should be heeded in viral studies.
Collapse
Affiliation(s)
- Mei Hou
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jingrong Shi
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Zanke Gong
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Haijun Wen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yun Lan
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Xizi Deng
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Qinghong Fan
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Jiaojiao Li
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Mengling Jiang
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Xiaoping Tang
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Chung-I Wu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Feng Li
- Guangzhou Eighth People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Yongsen Ruan
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
7
|
Zhang JX, Yuan Y, Hu QH, Jin DZ, Bai Y, Xin WW, Kang L, Wang JL. Identification of potential pathogenic targets and survival strategies of Vibrio vulnificus through population genomics. Front Cell Infect Microbiol 2023; 13:1254379. [PMID: 37692161 PMCID: PMC10485832 DOI: 10.3389/fcimb.2023.1254379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 07/27/2023] [Indexed: 09/12/2023] Open
Abstract
Vibrio vulnificus, a foodborne pathogen, has a high mortality rate. Despite its relevance to public health, the identification of virulence genes associated with the pathogenicity of currently known clinical isolates of V. vulnificus is incomplete and its synergistic pathogenesis remains unclear. Here, we integrate whole genome sequencing (WGS), genome-wide association studies (GWAS), and genome-wide epistasis studies (GWES), along with phenotype characterization to investigate the pathogenesis and survival strategies of V. vulnificus. GWAS and GWES identified a total of six genes (purH, gmr, yiaV, dsbD, ramA, and wbpA) associated with the pathogenicity of clinical isolates related to nucleotide/amino acid transport and metabolism, cell membrane biogenesis, signal transduction mechanisms, and protein turnover. Of these, five were newly discovered potential specific virulence genes of V. vulnificus in this study. Furthermore, GWES combined with phenotype experiments indicated that V. vulnificus isolates were clustered into two ecological groups (EGs) that shared distinct biotic and abiotic factors, and ecological strategies. Our study reveals pathogenic mechanisms and their evolution in V. vulnificus to provide a solid foundation for designing new vaccines and therapeutic targets.
Collapse
Affiliation(s)
- Jia-Xin Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing, China
| | - Yuan Yuan
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing, China
| | - Qing-hua Hu
- Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Da-zhi Jin
- Key Laboratory of Biomarkers and In Vitro Diagnosis Translation of Zhejiang Province, School of Laboratory Medicine, Hangzhou Medical College, Hangzhou, China
| | - Yao Bai
- China National Center for Food Safety Risk Assessment, Beijing, China
| | - Wen-Wen Xin
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing, China
| | - Lin Kang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing, China
| | - Jing-Lin Wang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences (AMMS), Beijing, China
| |
Collapse
|
8
|
Alvarez S, Nartey CM, Mercado N, de la Paz A, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. bioRxiv 2023:2023.05.24.542176. [PMID: 37292895 PMCID: PMC10245989 DOI: 10.1101/2023.05.24.542176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called Sequence Evolution with Epistatic Contributions. Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo β -lactamase activity in E. coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their WT predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Alberto de la Paz
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Tea Huseinbegovic
- School of Natural Sciences and Mathematics, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX 75080, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| |
Collapse
|
9
|
Dichio V, Zeng HL, Aurell E. Statistical genetics in and out of quasi-linkage equilibrium. Rep Prog Phys 2023; 86:052601. [PMID: 36944245 DOI: 10.1088/1361-6633/acc5fa] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/21/2023] [Indexed: 06/18/2023]
Abstract
This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase ofquasi-linkage equilibrium(QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal equilibrium state in statistical mechanics, was discovered by M Kimura for a two-locus two-allele model, and was extended and generalized to the global genome scale byNeher&Shraiman (2011). What we will refer to as the Kimura-Neher-Shraiman theory describes a population evolving due to the mutations, recombination, natural selection and possibly genetic drift. A QLE phase exists at sufficiently high recombination rate (r) and/or mutation ratesµwith respect to selection strength. We show how in QLE it is possible to infer the epistatic parameters of the fitness function from the knowledge of the (dynamical) distribution of genotypes in a population. We further consider the breakdown of the QLE regime for high enough selection strength. We review recent results for the selection-mutation and selection-recombination dynamics. Finally, we identify and characterize a new phase which we call the non-random coexistence where variability persists in the population without either fixating or disappearing.
Collapse
Affiliation(s)
- Vito Dichio
- Sorbonne Université, Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, F-75013 Paris, France
| | - Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, People's Republic of China
| | - Erik Aurell
- Department of Computational Science and Technology, KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
10
|
Broni E, Miller WA. Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes. Biomedicines 2023; 11:512. [PMID: 36831052 PMCID: PMC9953644 DOI: 10.3390/biomedicines11020512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/03/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of -0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at p < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at p < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2.
Collapse
Affiliation(s)
- Emmanuel Broni
- Department of Medicine, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA
| | - Whelton A. Miller
- Department of Medicine, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA
- Department of Molecular Pharmacology & Neuroscience, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA
| |
Collapse
|
11
|
Cueno ME, Wada K, Tsuji A, Ishikawa K, Imai K. Structural patterns of SARS-CoV-2 variants of concern (alpha, beta, gamma, delta) spike protein are influenced by variant-specific amino acid mutations: A computational study with implications on viral evolution. J Theor Biol 2023; 558:111376. [PMID: 36473508 DOI: 10.1016/j.jtbi.2022.111376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 11/28/2022] [Accepted: 11/29/2022] [Indexed: 12/12/2022]
Abstract
SARS-CoV-2 (SARS2) regularly mutates resulting to variants of concern (VOC) which have higher virulence and transmissibility rates while concurrently evading available therapeutic strategies. This highlights the importance of amino acid mutations occurring in the SARS2 spike protein structure since it may affect virus biology. However, this was never fully elucidated. Here, network analysis was performed based on the COVID-19 genomic epidemiology network between December 2019-July 2021. Representative SARS2 VOC spike protein models were generated and quality checked, protein model superimposition was done, and common contact based on contact mapping was established. Throughout this study, we found that: (1) certain individual variant-specific amino acid mutations can affect the spike protein structural pattern; (2) certain individual variant-specific amino acid mutations had no affect on the spike protein structural pattern; and (3) certain combination of variant-specific amino acids are putatively epistatic mutations that can potentially influence the VOC spike protein structural pattern. This manuscript was submitted as part of a theme issue on "Modelling COVID-19 and Preparedness for Future Pandemics".
Collapse
|
12
|
Neverov AD, Fedonin G, Popova A, Bykova D, Bazykin G. Coordinated evolution at amino acid sites of SARS-CoV-2 spike. eLife 2023; 12:82516. [PMID: 36752391 PMCID: PMC9908078 DOI: 10.7554/elife.82516] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 01/15/2023] [Indexed: 02/05/2023] Open
Abstract
SARS-CoV-2 has adapted in a stepwise manner, with multiple beneficial mutations accumulating in a rapid succession at origins of VOCs, and the reasons for this are unclear. Here, we searched for coordinated evolution of amino acid sites in the spike protein of SARS-CoV-2. Specifically, we searched for concordantly evolving site pairs (CSPs) for which changes at one site were rapidly followed by changes at the other site in the same lineage. We detected 46 sites which formed 45 CSP. Sites in CSP were closer to each other in the protein structure than random pairs, indicating that concordant evolution has a functional basis. Notably, site pairs carrying lineage defining mutations of the four VOCs that circulated before May 2021 are enriched in CSPs. For the Alpha VOC, the enrichment is detected even if Alpha sequences are removed from analysis, indicating that VOC origin could have been facilitated by positive epistasis. Additionally, we detected nine discordantly evolving pairs of sites where mutations at one site unexpectedly rarely occurred on the background of a specific allele at another site, for example on the background of wild-type D at site 614 (four pairs) or derived Y at site 501 (three pairs). Our findings hint that positive epistasis between accumulating mutations could have delayed the assembly of advantageous combinations of mutations comprising at least some of the VOCs.
Collapse
Affiliation(s)
- Alexey Dmitrievich Neverov
- HSE UniversityMoscowRussian Federation,Central Research Institute for EpidemiologyMoscowRussian Federation
| | - Gennady Fedonin
- Central Research Institute for EpidemiologyMoscowRussian Federation,Moscow Institute of Physics and Technology (National Research University)MoscowRussian Federation,Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of SciencesMoscowRussian Federation
| | - Anfisa Popova
- Central Research Institute for EpidemiologyMoscowRussian Federation
| | - Daria Bykova
- Central Research Institute for EpidemiologyMoscowRussian Federation,Lomonosov Moscow State UniversityMoscowRussian Federation
| | - Georgii Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of SciencesMoscowRussian Federation,Skolkovo Institute of Science and TechnologyMoscowRussian Federation
| |
Collapse
|
13
|
Rana V, Chien E, Peng J, Milenkovic O. Small-Sample Estimation of the Mutational Support and Distribution of SARS-CoV-2. IEEE/ACM Trans Comput Biol Bioinform 2023; 20:668-682. [PMID: 35385386 PMCID: PMC10009811 DOI: 10.1109/tcbb.2022.3165395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We consider the problem of determining the mutational support and distribution of the SARS-CoV-2 viral genome in the small-sample regime. The mutational support refers to the unknown number of sites that may eventually mutate in the SARS-CoV-2 genome while mutational distribution refers to the distribution of point mutations in the viral genome across a population. The mutational support may be used to assess the virulence of the virus and guide primer selection for real-time RT-PCR testing. Estimating the distribution of mutations in the genome of different subpopulations while accounting for the unseen may also aid in discovering new variants. To estimate the mutational support in the small-sample regime, we use GISAID sequencing data and our state-of-the-art polynomial estimation techniques based on new weighted and regularized Chebyshev approximation methods. For distribution estimation, we adapt the well-known Good-Turing estimator. Our analysis reveals several findings: First, the mutational supports exhibit significant differences in the ORF6 and ORF7a regions (older versus younger patients), ORF1b and ORF10 regions (females versus males) and in almost all ORFs (Asia/Europe/North America). Second, even though the N region of SARS-CoV-2 has a predicted 10% mutational support, mutations fall outside of the primer regions recommended by the CDC.
Collapse
|
14
|
Li C, Yue L, Ju Y, Wang J, Chen M, Lu H, Liu S, Liu T, Wang J, Hu X, Tuohetaerbaike B, Wen H, Zhang W, Xu S, Jiang C, Chen F. Serum Proteomic Analysis for New Types of Long-Term Persistent COVID-19 Patients in Wuhan. Microbiol Spectr 2022; 10:e0127022. [PMID: 36314975 PMCID: PMC9784772 DOI: 10.1128/spectrum.01270-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 10/07/2022] [Indexed: 12/24/2022] Open
Abstract
The emergence of a new type of COVID-19 patients, who were retested positive after hospital discharge with long-term persistent SARS-CoV-2 infection but without COVID-19 clinical symptoms (hereinafter, LTPPs), poses novel challenges to COVID-19 treatment and prevention. Why was there such a contradictory phenomenon in LTPPs? To explore the mechanism underlying this phenomenon, we performed quantitative proteomic analyses using the sera of 12 LTPPs (Wuhan Pulmonary Hospital), with the longest carrying history of 132 days, and mainly focused on 7 LTPPs without hypertension (LTPPs-NH). The results showed differential serum protein profiles between LTPPs/LTPPs-NH and health controls. Further analysis identified 174 differentially-expressed-proteins (DEPs) for LTPPs, and 165 DEPs for LTPPs-NH, most of which were shared. GO and KEGG analyses for these DEPs revealed significant enrichment of "coagulation" and "immune response" in both LTPPs and LTPPs-NH. A unity of contradictory genotypes in the 2 aspects were then observed: some DEPs showed the same dysregulated expressed trend as that previously reported for patients in the acute phase of COVID-19, which might be caused by long-term stimulation of persistent SARS-CoV-2 infection in LTPPs, further preventing them from complete elimination; in contrast, some DEPs showed the opposite expression trend in expression, so as to retain control of COVID-19 clinical symptoms in LTPPs. Overall, the contrary effects of these DEPs worked together to maintain the balance of LTPPs, further endowing their contradictory steady-state with long-term persistent SARS-CoV-2 infection but without symptoms. Additionally, our study revealed some potential therapeutic targets of COVID-19. Further studies on these are warranted. IMPORTANCE This study reported a new type of COVID-19 patients and explored the underlying molecular mechanism by quantitative proteomic analyses. DEPs were significantly enriched in "coagulation" and "immune response". Importantly, we identified 7 "coagulation system"- and 9 "immune response"-related DEPs, the expression levels of which were consistent with those previously reported for patients in the acute phase of COVID-19, which appeared to play a role in avoiding the complete elimination of SARS-CoV-2 in LTPPs. On the contrary, 6 "coagulation system"- and 5 "immune response"-related DEPs showed the opposite trend in expression. The 11 inconsistent serum proteins seem to play a key role in the fight against long-term persistent SARS-CoV-2 infection, further retaining control of COVID-19 clinical symptom of LTPPs. The 26 proteins can serve as potential therapeutic targets and are thus valuable for the treatment of LTPPs; further studies on them are warranted.
Collapse
Affiliation(s)
- Cuidan Li
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
| | - Liya Yue
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
| | - Yingjiao Ju
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jie Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mengfan Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hao Lu
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Sitong Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Liu
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jing Wang
- State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Urumqi, Xinjiang, China
| | - Xin Hu
- State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Urumqi, Xinjiang, China
| | - Bahetibieke Tuohetaerbaike
- State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Urumqi, Xinjiang, China
| | - Hao Wen
- State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Urumqi, Xinjiang, China
| | - Wenbao Zhang
- State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Urumqi, Xinjiang, China
| | - Sihong Xu
- Division II of In Vitro Diagnostics for Infectious Diseases, Institute for In Vitro Diagnostics Control, National Institutes for Food and Drug Control, Beijing, China
| | - Chunlai Jiang
- National Engineering Laboratory for AIDS Vaccine, School of Life Science, Jilin University, Changchun, China
| | - Fei Chen
- Beijing Institute of Genomics, Chinese Academy of Sciences, China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Urumqi, Xinjiang, China
- Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing, China
| |
Collapse
|
15
|
Barnes JE, Lund-Andersen PK, Patel JS, Ytreberg FM. The effect of mutations on binding interactions between the SARS-CoV-2 receptor binding domain and neutralizing antibodies B38 and CB6. Sci Rep 2022; 12:18819. [PMID: 36335244 PMCID: PMC9637166 DOI: 10.1038/s41598-022-23482-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 11/01/2022] [Indexed: 11/08/2022] Open
Abstract
SARS-CoV-2 is the pathogen responsible for COVID-19 that has claimed over six million lives as of July 2022. The severity of COVID-19 motivates a need to understand how it could evolve to escape potential treatments and to find ways to strengthen existing treatments. Here, we used the molecular modeling methods MD + FoldX and PyRosetta to study the SARS-CoV-2 spike receptor binding domain (S-RBD) bound to two neutralizing antibodies, B38 and CB6 and generated lists of antibody escape and antibody strengthening mutations. Our resulting watchlist contains potential antibody escape mutations against B38/CB6 and consists of 211/186 mutations across 35/22 S-RBD sites. Some of these mutations have been identified in previous studies as being significant in human populations (e.g., N501Y). The list of potential antibody strengthening mutations that are predicted to improve binding of B38/CB6 to S-RBD consists of 116/45 mutations across 29/13 sites. These mutations could be used to improve the therapeutic value of these antibodies.
Collapse
Affiliation(s)
- Jonathan E Barnes
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, 83843, USA
| | - Peik K Lund-Andersen
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, 83843, USA
- Department of Biological Sciences, University of Idaho, Moscow, ID, 83843, USA
| | - Jagdish Suresh Patel
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, 83843, USA.
- Department of Biological Sciences, University of Idaho, Moscow, ID, 83843, USA.
| | - F Marty Ytreberg
- Institute for Modeling Collaboration and Innovation, University of Idaho, Moscow, ID, 83843, USA.
- Department of Physics, University of Idaho, Moscow, ID, 83843, USA.
| |
Collapse
|
16
|
Zeng HL, Liu Y, Dichio V, Aurell E. Temporal epistasis inference from more than 3 500 000 SARS-CoV-2 genomic sequences. Phys Rev E 2022; 106:044409. [PMID: 36397507 DOI: 10.1103/physreve.106.044409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
We use direct coupling analysis (DCA) to determine epistatic interactions between loci of variability of the SARS-CoV-2 virus, segmenting genomes by month of sampling. We use full-length, high-quality genomes from the GISAID repository up to October 2021 for a total of over 3 500 000 genomes. We find that DCA terms are more stable over time than correlations but nevertheless change over time as mutations disappear from the global population or reach fixation. Correlations are enriched for phylogenetic effects, and in particularly statistical dependencies at short genomic distances, while DCA brings out links at longer genomic distance. We discuss the validity of a DCA analysis under these conditions in terms of a transient auasilinkage equilibrium state. We identify putative epistatic interaction mutations involving loci in spike.
Collapse
Affiliation(s)
- Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, China
| | - Yue Liu
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, China
| | - Vito Dichio
- Inria Paris, Aramis Project Team, Paris 75013, France
- Institut du Cerveau, ICM, Inserm U 1127, CNRS UMR 7225, Sorbonne Université, Paris, France
| | - Erik Aurell
- Department of Computational Science and Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
17
|
Yang HC, Wang JH, Yang CT, Lin YC, Hsieh HN, Chen PW, Liao HC, Chen CH, Liao JC. Subtyping of major SARS-CoV-2 variants reveals different transmission dynamics based on 10 million genomes. PNAS Nexus 2022; 1:pgac181. [PMID: 36714842 PMCID: PMC9802201 DOI: 10.1093/pnasnexus/pgac181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/30/2022] [Indexed: 02/01/2023]
Abstract
SARS-CoV-2 continues to evolve, causing waves of the pandemic. Up to May 2022, 10 million genome sequences have accumulated, which are classified into five major variants of concern. With the growing number of sequenced genomes, analysis of the big dataset has become increasingly challenging. Here we developed systematic approaches based on sets of correlated single nucleotide variations (SNVs) for comprehensive subtyping and pattern recognition of transmission dynamics. The approach outperformed single-SNV and spike-centric scans. Moreover, the derived subtypes elucidate the relationship of signature SNVs and transmission dynamics. We found that different subtypes of the same variant, including Delta and Omicron exhibited distinct temporal trajectories. For example, some Delta and Omicron subtypes did not spread rapidly, while others did. We identified sets of characteristic SNVs that appeared to enhance transmission or decrease efficacy of antibodies for some subtypes. We also identified a set of SNVs that appeared to suppress transmission or increase viral sensitivity to antibodies. For the Omicron variant, the dominant type in the world, we identified the subtypes with enhanced and suppressed transmission in an analysis of eight million genomes as of March 2022 and further confirmed the findings in a later analysis of ten million genomes as of May 2022. While the "enhancer" SNVs exhibited an enriched presence on the spike protein, the "suppressor" SNVs are mainly elsewhere. Disruption of the SNV correlation largely destroyed the enhancer-suppressor phenomena. These results suggest the importance of fine subtyping of variants, and point to potential complex interactions among SNVs.
Collapse
Affiliation(s)
| | | | | | | | - Han-Ni Hsieh
- Institute of Statistical Science, Academia Sinica, Academia Rd, Nangang District Taipei 115, Taiwan
| | - Po-Wen Chen
- Institute of Statistical Science, Academia Sinica, Academia Rd, Nangang District Taipei 115, Taiwan
| | - Hsiao-Chi Liao
- Institute of Statistical Science, Academia Sinica, Academia Rd, Nangang District Taipei 115, Taiwan
| | - Chun-houh Chen
- Institute of Statistical Science, Academia Sinica, Academia Rd, Nangang District Taipei 115, Taiwan
| | | |
Collapse
|
18
|
Shchur V, Spirin V, Sirotkin D, Burovski E, De Maio N, Corbett-Detig R. VGsim: Scalable viral genealogy simulator for global pandemic. PLoS Comput Biol 2022; 18:e1010409. [PMID: 36001646 PMCID: PMC9447924 DOI: 10.1371/journal.pcbi.1010409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 09/06/2022] [Accepted: 07/18/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. We develop a fast and flexible simulation software package VGsim for modeling epidemiological processes and generating genealogies of large pathogen samples. The software takes into account host population structure, pathogen evolution, host immunity and some other epidemiological aspects. The computational efficiency of the package allows to simulate genealogies of tens of millions of samples, which is important, e.g., for SARS-CoV-2 genome studies.
Collapse
Affiliation(s)
- Vladimir Shchur
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
- * E-mail:
| | - Vadim Spirin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | - Dmitry Sirotkin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California, United States of America
| |
Collapse
|
19
|
Chan FHM, Ataide R, Richards JS, Narh CA. Contrasting Epidemiology and Population Genetics of COVID-19 Infections Defined by Multilocus Genotypes in SARS-CoV-2 Genomes Sampled Globally. Viruses 2022; 14. [PMID: 35891414 DOI: 10.3390/v14071434] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 12/28/2022] Open
Abstract
Since its emergence in 2019, SARS-CoV-2 has spread and evolved globally, with newly emerged variants of concern (VOCs) accounting for more than 500 million COVID-19 cases and 6 million deaths. Continuous surveillance utilizing simple genetic tools is needed to measure the viral epidemiological diversity, risk of infection, and distribution among different demographics in different geographical regions. To help address this need, we developed a proof-of-concept multilocus genotyping tool and demonstrated its utility to monitor viral populations sampled in 2020 and 2021 across six continents. We sampled globally 22,164 SARS-CoV-2 genomes from GISAID (inclusion criteria: available clinical and demographic data). They comprised two study populations, “2020 genomes” (N = 5959) sampled from December 2019 to September 2020 and “2021 genomes” (N = 16,205) sampled from 15 January to 15 March 2021. All genomes were aligned to the SARS-CoV-2 reference genome and amino acid polymorphisms were called with quality filtering. Thereafter, 74 codons (loci) in 14 genes including orf1ab polygene (N = 9), orf3a, orf8, nucleocapsid (N), matrix (M), and spike (S) met the 0.01 minimum allele frequency criteria and were selected to construct multilocus genotypes (MLGs) for the genomes. At these loci, 137 mutant/variant amino acids (alleles) were detected with eight VOC-defining variant alleles, including N KR203&204, orf1ab (I265, F3606, and L4715), orf3a H57, orf8 S84, and S G614, being predominant globally with > 35% prevalence. Their persistence and selection were associated with peaks in the viral transmission and COVID-19 incidence between 2020 and 2021. Epidemiologically, older patients (≥20 years) compared to younger patients (<20 years) had a higher risk of being infected with these variants, but this association was dependent on the continent of origin. In the global population, the discriminant analysis of principal components (DAPC) showed contrasting patterns of genetic clustering with three (Africa, Asia, and North America) and two (North and South America) continental clusters being observed for the 2020 and 2021 global populations, respectively. Within each continent, the MLG repertoires (range 40−199) sampled in 2020 and 2021 were genetically differentiated, with ≤4 MLGs per repertoire accounting for the majority of genomes sampled. These data suggested that the majority of SARS-CoV-2 infections in 2020 and 2021 were caused by genetically distinct variants that likely adapted to local populations. Indeed, four GISAID clade-defined VOCs - GRY (Alpha), GH (Beta), GR (Gamma), and G/GK (Delta variant) were differentiated by their MLG signatures, demonstrating the versatility of the MLG tool for variant identification. Results from this proof-of-concept multilocus genotyping demonstrates its utility for SARS-CoV-2 genomic surveillance and for monitoring its spatiotemporal epidemiology and evolution, particularly in response to control interventions including COVID-19 vaccines and chemotherapies.
Collapse
|
20
|
Abstract
Background: SARS-CoV-2 virus is a highly transmissible pathogen that causes COVID-19. The outbreak originated in Wuhan, China in December 2019. A number of nonsynonymous mutations located at different SARS-CoV-2 proteins have been reported by multiple studies. However, there are limited computational studies on the biological impacts of these mutations on the structure and function of the proteins. Methods: In our study nonsynonymous mutations of the SARS-CoV-2 genome and their frequencies were identified from 30,229 sequences. Subsequently, the effects of the top 10 highest frequency nonsynonymous mutations of different SARS-CoV-2 proteins were analyzed using bioinformatics tools including co-mutation analysis, prediction of the protein structure stability and flexibility analysis, and prediction of the protein functions. Results: A total of 231 nonsynonymous mutations were identified from 30,229 SARS-CoV-2 genome sequences. The top 10 nonsynonymous mutations affecting nine amino acid residues were ORF1a nsp5 P108S, ORF1b nsp12 P323L and A423V, S protein N501Y and D614G, ORF3a Q57H, N protein P151L, R203K and G204R. Many nonsynonymous mutations showed a high concurrence ratio, suggesting these mutations may evolve together and interact functionally. Our result showed that ORF1a nsp5 P108S, ORF3a Q57H and N protein P151L mutations may be deleterious to the function of SARS-CoV-2 proteins. In addition, ORF1a nsp5 P108S and S protein D614G may destabilize the protein structures while S protein D614G may have a more open conformation compared to the wild type. Conclusion: The biological consequences of these nonsynonymous mutations of SARS-CoV-2 proteins should be further validated by in vivo and in vitro experimental studies in the future.
Collapse
Affiliation(s)
- Boon Zhan Sia
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Wan Xin Boon
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Yoke Yee Yap
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Shalini Kumar
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Chong Han Ng
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| |
Collapse
|
21
|
Alam ASMRU, Islam OK, Hasan MS, Islam MR, Mahmud S, Al‐Emran HM, Jahid IK, Crandall KA, Hossain MA. Dominant clade-featured SARS-CoV-2 co-occurring mutations reveal plausible epistasis: An in silico based hypothetical model. J Med Virol 2022; 94:1035-1049. [PMID: 34676891 PMCID: PMC8661685 DOI: 10.1002/jmv.27416] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 10/15/2021] [Accepted: 10/20/2021] [Indexed: 01/18/2023]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved into eight fundamental clades with four of these clades (G, GH, GR, and GV) globally prevalent in 2020. To explain plausible epistatic effects of the signature co-occurring mutations of these circulating clades on viral replication and transmission fitness, we proposed a hypothetical model using in silico approach. Molecular docking and dynamics analyses showed the higher infectiousness of a spike mutant through more favorable binding of G614 with the elastase-2. RdRp mutation p.P323L significantly increased genome-wide mutations (p < 0.0001), allowing for more flexible RdRp (mutated)-NSP8 interaction that may accelerate replication. Superior RNA stability and structural variation at NSP3:C241T might impact protein, RNA interactions, or both. Another silent 5'-UTR:C241T mutation might affect translational efficiency and viral packaging. These four G-clade-featured co-occurring mutations might increase viral replication. Sentinel GH-clade ORF3a:p.Q57H variants constricted the ion-channel through intertransmembrane-domain interaction of cysteine(C81)-histidine(H57). The GR-clade N:p.RG203-204KR would stabilize RNA interaction by a more flexible and hypo-phosphorylated SR-rich region. GV-clade viruses seemingly gained the evolutionary advantage of the confounding factors; nevertheless, N:p.A220V might modulate RNA binding with no phenotypic effect. Our hypothetical model needs further retrospective and prospective studies to understand detailed molecular events and their relationship to the fitness of SARS-CoV-2.
Collapse
Affiliation(s)
| | - Ovinu Kibria Islam
- Department of MicrobiologyJashore University of Science and TechnologyJashoreBangladesh
| | - Md. Shazid Hasan
- Department of MicrobiologyJashore University of Science and TechnologyJashoreBangladesh
| | - Mir Raihanul Islam
- Division of Poverty, Health, and NutritionInternational Food Policy Research InstituteBangladesh
| | - Shafi Mahmud
- Department Genetic Engineering and BiotechnologyUniversity of RajshahiRajshahiBangladesh
| | - Hassan M. Al‐Emran
- Department of Biomedical EngineeringJashore University of Science and TechnologyJashoreBangladesh
| | - Iqbal Kabir Jahid
- Department of MicrobiologyJashore University of Science and TechnologyJashoreBangladesh
| | - Keith A. Crandall
- Department of Biostatistics and Bioinformatics, Computational Biology Institute, Milken Institute School of Public HealthThe George Washington UniversityWashington DCUSA
| | - M. Anwar Hossain
- Office of the Vice ChancellorJashore University of Science and TechnologyJashoreBangladesh
- Department of MicrobiologyUniversity of DhakaDhakaBangladesh
| |
Collapse
|
22
|
Rodriguez-Rivas J, Croce G, Muscat M, Weigt M. Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes. Proc Natl Acad Sci U S A 2022; 119:e2113118119. [PMID: 35022216 DOI: 10.1073/pnas.2113118119] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/13/2021] [Indexed: 12/21/2022] Open
Abstract
During the COVID pandemic, new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants emerge and spread, some being of major concern due to their increased infectivity or capacity to reduce vaccine efficiency. Anticipating mutations, which might give rise to new variants, would be of great interest. We construct sequence models predicting how mutable SARS-CoV-2 positions are, using a single SARS-CoV-2 sequence and databases of other coronaviruses. Predictions are tested against available mutagenesis data and the observed variability of SARS-CoV-2 proteins. Interestingly, predictions agree increasingly with observations, as more SARS-CoV-2 sequences become available. Combining predictions with immunological data, we find an overrepresentation of mutations in current variants of concern. The approach may become relevant for potential outbreaks of future viral diseases. The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major concern given their potential impact on the transmissibility and pathogenicity of the virus as well as the efficacy of therapeutic interventions. Here, we predict the mutability of all positions in SARS-CoV-2 protein domains to forecast the appearance of unseen variants. Using sequence data from other coronaviruses, preexisting to SARS-CoV-2, we build statistical models that not only capture amino acid conservation but also more complex patterns resulting from epistasis. We show that these models are notably superior to conservation profiles in estimating the already observable SARS-CoV-2 variability. In the receptor binding domain of the spike protein, we observe that the predicted mutability correlates well with experimental measures of protein stability and that both are reliable mutability predictors (receiver operating characteristic areas under the curve ∼0.8). Most interestingly, we observe an increasing agreement between our model and the observed variability as more data become available over time, proving the anticipatory capacity of our model. When combined with data concerning the immune response, our approach identifies positions where current variants of concern are highly overrepresented. These results could assist studies on viral evolution and future viral outbreaks and, in particular, guide the exploration and anticipation of potentially harmful future SARS-CoV-2 variants.
Collapse
|
23
|
Gupta S, Mallick D, Banerjee K, Mukherjee S, Sarkar S, Lee STM, Basuchowdhuri P, Jana SS. D155Y substitution of SARS-CoV-2 ORF3a weakens binding with Caveolin-1. Comput Struct Biotechnol J 2022; 20:766-778. [PMID: 35126886 PMCID: PMC8802530 DOI: 10.1016/j.csbj.2022.01.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 01/15/2022] [Accepted: 01/18/2022] [Indexed: 02/08/2023] Open
Abstract
The clinical manifestation of the recent pandemic COVID-19, caused by the novel SARS-CoV-2 virus, varies from mild to severe respiratory illness. Although environmental, demographic and co-morbidity factors have an impact on the severity of the disease, contribution of the mutations in each of the viral genes towards the degree of severity needs a deeper understanding for designing a better therapeutic approach against COVID-19. Open Reading Frame-3a (ORF3a) protein has been found to be mutated at several positions. In this work, we have studied the effect of one of the most frequently occurring mutants, D155Y of ORF3a protein, found in Indian COVID-19 patients. Using computational simulations we demonstrated that the substitution at 155th changed the amino acids involved in salt bridge formation, hydrogen-bond occupancy, interactome clusters, and the stability of the protein compared with the other substitutions found in Indian patients. Protein–protein docking using HADDOCK analysis revealed that substitution D155Y weakened the binding affinity of ORF3a with caveolin-1 compared with the other substitutions, suggesting its importance in the overall stability of ORF3a-caveolin-1 complex, which may modulate the virulence property of SARS-CoV-2.
Collapse
|
24
|
Swift CL, Isanovic M, Correa Velez KE, Norman RS. Community-level SARS-CoV-2 sequence diversity revealed by wastewater sampling. Sci Total Environ 2021; 801:149691. [PMID: 34438144 PMCID: PMC8372435 DOI: 10.1016/j.scitotenv.2021.149691] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/11/2021] [Accepted: 08/11/2021] [Indexed: 05/20/2023]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for causing the COVID-19 pandemic, can be detected in untreated wastewater. Wastewater surveillance of SARS-CoV-2 complements clinical data by offering earlier community-level detection, removing underlying factors such as access to healthcare, sampling asymptomatic patients, and reaching a greater population. Here, we compare 24-hour composite samples from the influents of two different wastewater treatment plants (WWTPs) in South Carolina, USA: Columbia and Rock Hill. The sampling intervals span the months of July 2020 and January 2021, which cover the first and second waves of elevated SARS-CoV-2 transmission and COVID-19 clinical cases in these regions. We identify four signature mutations in the surface glycoprotein (spike) gene that are associated with the following variants of interest or concern, VOI or VOC (listed in parenthesis): S477N (B.1.526, Iota), T478K (B.1.617.2, Delta), D614G (present in all VOC as of May 2021), and H655Y (P.1, Gamma). The N501Y mutation, which is associated with three variants of concern, was identified in samples from July 2020, but not detected in January 2021 samples. Comparison of mutations identified in viral sequence databases such as NCBI Virus and GISAID indicated that wastewater sampling detected mutations that were present in South Carolina, but not reflected in the clinical data deposited into databases.
Collapse
Affiliation(s)
- Candice L Swift
- Department of Environmental Health Sciences, University of South Carolina, USA
| | - Mirza Isanovic
- Department of Environmental Health Sciences, University of South Carolina, USA
| | | | - R Sean Norman
- Department of Environmental Health Sciences, University of South Carolina, USA.
| |
Collapse
|
25
|
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. The code is freely available at https://github.com/Genomics-HSE/VGsim.
Collapse
Affiliation(s)
| | | | | | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Russell Corbett-Detig
- HSE University, Russian Federation
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California 95064
| |
Collapse
|
26
|
Song S, Li C, Kang L, Tian D, Badar N, Ma W, Zhao S, Jiang X, Wang C, Sun Y, Li W, Lei M, Li S, Qi Q, Ikram A, Salman M, Umair M, Shireen H, Batool F, Zhang B, Chen H, Yang YG, Abbasi AA, Li M, Xue Y, Bao Y. Genomic Epidemiology of SARS-CoV-2 in Pakistan. Genomics Proteomics Bioinformatics 2021; 19:727-740. [PMID: 34695600 PMCID: PMC8546014 DOI: 10.1016/j.gpb.2021.08.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/30/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022]
Abstract
COVID-19 has swept globally and Pakistan is no exception. To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1, 2020. We identified a total of 347 mutated positions, 31 of which were over-represented in Pakistan. Meanwhile, we found over 1000 intra-host single-nucleotide variants (iSNVs). Several of them occurred concurrently, indicating possible interactions among them or coevolution. Some of the high-frequency iSNVs in Pakistan were not observed in the global population, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation (G8371T in ORF1ab) of this cluster. Furthermore, 28 putative international introductions were identified, several of which are consistent with the epidemiological investigations. In all, this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan, which could aid ongoing and future viral surveillance and COVID-19 control.
Collapse
Affiliation(s)
- Shuhui Song
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Cuiping Li
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lu Kang
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dongmei Tian
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Nazish Badar
- Department of Virology and Immunology, National Institute of Health, Islamabad 45500, Pakistan
| | - Wentai Ma
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shilei Zhao
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xuan Jiang
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Chun Wang
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yongqiao Sun
- China National Center for Bioinformation, Beijing 100101, China
| | - Wenjie Li
- China National Center for Bioinformation, Beijing 100101, China
| | - Meng Lei
- China National Center for Bioinformation, Beijing 100101, China
| | - Shuangli Li
- China National Center for Bioinformation, Beijing 100101, China
| | - Qiuhui Qi
- China National Center for Bioinformation, Beijing 100101, China
| | - Aamer Ikram
- Department of Virology and Immunology, National Institute of Health, Islamabad 45500, Pakistan
| | - Muhammad Salman
- Department of Virology and Immunology, National Institute of Health, Islamabad 45500, Pakistan
| | - Massab Umair
- Department of Virology and Immunology, National Institute of Health, Islamabad 45500, Pakistan
| | - Huma Shireen
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Fatima Batool
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
| | - Bing Zhang
- China National Center for Bioinformation, Beijing 100101, China
| | - Hua Chen
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Yun-Gui Yang
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Amir Ali Abbasi
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan.
| | - Mingkun Li
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.
| | - Yongbiao Xue
- China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, The Innovation Academy of Seed Design, Chinese Academy of Sciences, Beijing 100101, China.
| | - Yiming Bao
- China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
27
|
Abstract
The SARS-CoV-2 virus causing the global pandemic is a coronavirus with a genome of about 30Kbase length. The design of vaccines and choice of therapies depends on the structure and mutational stability of encoded proteins in the open reading frames(ORFs) of this genome. In this study, we computed, using Expectation Reflection, the genome-wide covariation of the SARS-CoV-2 genome based on an alignment of ≈130000 SARS-CoV-2 complete genome sequences obtained from GISAID. We used this covariation to compute the Direct Information between pairs of positions across the whole genome, investigating potentially important relationships within the genome, both within each encoded protein and between encoded proteins. We then computed the covariation within each clade of the virus. The covariation detected recapitulates all clade determinants and each clade exhibits distinct covarying pairs.
Collapse
|
28
|
Gallardo CM, Wang S, Montiel-Garcia DJ, Little SJ, Smith DM, Routh AL, Torbett BE. MrHAMER yields highly accurate single molecule viral sequences enabling analysis of intra-host evolution. Nucleic Acids Res 2021; 49:e70. [PMID: 33849057 PMCID: PMC8266615 DOI: 10.1093/nar/gkab231] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/12/2021] [Accepted: 03/31/2021] [Indexed: 12/31/2022] Open
Abstract
Technical challenges remain in the sequencing of RNA viruses due to their high intra-host diversity. This bottleneck is particularly pronounced when interrogating long-range co-evolved genetic interactions given the read-length limitations of next-generation sequencing platforms. This has hampered the direct observation of these genetic interactions that code for protein-protein interfaces with relevance in both drug and vaccine development. Here we overcome these technical limitations by developing a nanopore-based long-range viral sequencing pipeline that yields accurate single molecule sequences of circulating virions from clinical samples. We demonstrate its utility in observing the evolution of individual HIV Gag-Pol genomes in response to antiviral pressure. Our pipeline, called Multi-read Hairpin Mediated Error-correction Reaction (MrHAMER), yields >1000s of viral genomes per sample at 99.9% accuracy, maintains the original proportion of sequenced virions present in a complex mixture, and allows the detection of rare viral genomes with their associated mutations present at <1% frequency. This method facilitates scalable investigation of genetic correlates of resistance to both antiviral therapy and immune pressure and enables the identification of novel host-viral and viral-viral interfaces that can be modulated for therapeutic benefit.
Collapse
Affiliation(s)
- Christian M Gallardo
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA.,Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
| | - Shiyi Wang
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA.,Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
| | - Daniel J Montiel-Garcia
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Susan J Little
- Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA, USA
| | - Davey M Smith
- Division of Infectious Diseases and Global Public Health, University of California, San Diego, La Jolla, CA, USA.,Veterans Affairs San Diego Healthcare System, San Diego, CA, USA
| | - Andrew L Routh
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA.,Sealy Center for Structural Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Bruce E Torbett
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA.,Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA.,Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
29
|
Cherian S, Potdar V, Jadhav S, Yadav P, Gupta N, Das M, Rakshit P, Singh S, Abraham P, Panda S, Team NIC. SARS-CoV-2 Spike Mutations, L452R, T478K, E484Q and P681R, in the Second Wave of COVID-19 in Maharashtra, India. Microorganisms 2021; 9:1542. [PMID: 34361977 PMCID: PMC8307577 DOI: 10.3390/microorganisms9071542] [Citation(s) in RCA: 393] [Impact Index Per Article: 131.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 06/12/2021] [Accepted: 07/01/2021] [Indexed: 12/19/2022] Open
Abstract
As the global severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic expands, genomic epidemiology and whole genome sequencing are being used to investigate its transmission and evolution. Against the backdrop of the global emergence of "variants of concern" (VOCs) during December 2020 and an upsurge in a state in the western part of India since January 2021, whole genome sequencing and analysis of spike protein mutations using sequence and structural approaches were undertaken to identify possible new variants and gauge the fitness of the current circulating strains. Phylogenetic analysis revealed that newly identified lineages B.1.617.1 and B.1.617.2 were predominantly circulating. The signature mutations possessed by these strains were L452R, T478K, E484Q, D614G and P681R in the spike protein, including within the receptor-binding domain (RBD). Of these, the mutations at residue positions 452, 484 and 681 have been reported in other globally circulating lineages. The structural analysis of RBD mutations L452R, T478K and E484Q revealed that these may possibly result in increased ACE2 binding while P681R in the furin cleavage site could increase the rate of S1-S2 cleavage, resulting in better transmissibility. The two RBD mutations, L452R and E484Q, indicated decreased binding to select monoclonal antibodies (mAbs) and may affect their neutralization potential. Further in vitro/in vivo studies would help confirm the phenotypic changes of the mutant strains. Overall, the study revealed that the newly emerged variants were responsible for the second wave of COVID-19 in Maharashtra. Lineage B.1.617.2 has been designated as a VOC delta and B.1.617.1 as a variant of interest kappa, and they are being widely reported in the rest of the country as well as globally. Continuous monitoring of these and emerging variants in India is essential.
Collapse
Affiliation(s)
- Sarah Cherian
- ICMR-National Institute of Virology, Pune 411001, India; (S.C.); (V.P.); (S.J.); (P.Y.); (M.D.)
| | - Varsha Potdar
- ICMR-National Institute of Virology, Pune 411001, India; (S.C.); (V.P.); (S.J.); (P.Y.); (M.D.)
| | - Santosh Jadhav
- ICMR-National Institute of Virology, Pune 411001, India; (S.C.); (V.P.); (S.J.); (P.Y.); (M.D.)
| | - Pragya Yadav
- ICMR-National Institute of Virology, Pune 411001, India; (S.C.); (V.P.); (S.J.); (P.Y.); (M.D.)
| | - Nivedita Gupta
- Indian Council of Medical Research, New Delhi 110029, India; (N.G.); (S.P.)
| | - Mousumi Das
- ICMR-National Institute of Virology, Pune 411001, India; (S.C.); (V.P.); (S.J.); (P.Y.); (M.D.)
| | - Partha Rakshit
- National Centre for Disease Control, New Delhi 110054, India; (P.R.); (S.S.)
| | - Sujeet Singh
- National Centre for Disease Control, New Delhi 110054, India; (P.R.); (S.S.)
| | - Priya Abraham
- ICMR-National Institute of Virology, Pune 411001, India; (S.C.); (V.P.); (S.J.); (P.Y.); (M.D.)
| | - Samiran Panda
- Indian Council of Medical Research, New Delhi 110029, India; (N.G.); (S.P.)
| | | |
Collapse
|
30
|
Rezaei S, Sefidbakht Y, Uskoković V. Tracking the pipeline: immunoinformatics and the COVID-19 vaccine design. Brief Bioinform 2021; 22:6313266. [PMID: 34219142 DOI: 10.1093/bib/bbab241] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/23/2021] [Accepted: 06/04/2021] [Indexed: 12/23/2022] Open
Abstract
With the onset of the COVID-19 pandemic, the amount of data on genomic and proteomic sequences of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) stored in various databases has exponentially grown. A large volume of these data has led to the production of equally immense sets of immunological data, which require rigorous computational approaches to sort through and make sense of. Immunoinformatics has emerged in the recent decades as a field capable of offering this approach by bridging experimental and theoretical immunology with state-of-the-art computational tools. Here, we discuss how immunoinformatics can assist in the development of high-performance vaccines and drug discovery needed to curb the spread of SARS-CoV-2. Immunoinformatics can provide a set of computational tools to extract meaningful connections from the large sets of COVID-19 patient data, which can be implemented in the design of effective vaccines. With this in mind, we represent a pipeline to identify the role of immunoinformatics in COVID-19 treatment and vaccine development. In this process, a number of free databases of protein sequences, structures and mutations are introduced, along with docking web servers for assessing the interaction between antibodies and the SARS-CoV-2 spike protein segments as most commonly considered antigens in vaccine design.
Collapse
Affiliation(s)
- Shokouh Rezaei
- Protein Research Center at Shahid Beheshti University, Tehran, Iran
| | - Yahya Sefidbakht
- Protein Research Center at Shahid Beheshti University, Tehran, Iran
| | - Vuk Uskoković
- Founder of the biotech startup, TardigradeNano, and formerly a Professor at University of Illinois in Chicago, Chapman University, and University of California in Irvine
| |
Collapse
|
31
|
Rodriguez Horta E, Weigt M. On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Comput Biol 2021; 17:e1008957. [PMID: 34029316 DOI: 10.1371/journal.pcbi.1008957] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 06/04/2021] [Accepted: 04/09/2021] [Indexed: 12/04/2022] Open
Abstract
Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. Many homologous protein families contain thousands of highly diverged amino-acid sequences, which fold into close-to-identical three-dimensional structures and fulfill almost identical biological tasks. Global coevolutionary models, like those inferred by the Direct Coupling Analysis (DCA), assume that families can be considered as samples of some unknown statistical model, and that the parameters of these models represent evolutionary constraints acting on protein sequences. To learn these models from data, DCA and related approaches have to also assume that the distinct sequences in a protein family are close to independent, while in reality they are characterized by involved hierarchical phylogenetic relationships. Here we propose Null models for sequence alignments, which maintain patterns of amino-acid conservation and phylogeny contained in the data, but destroy any coevolutionary couplings, frequently used in protein structure prediction. We find that phylogeny actually induces spurious non-zero couplings. These are, however, significantly smaller that the largest couplings derived from natural sequences, and therefore have only little influence on the first predicted contacts. However, in the range of intermediate couplings, they may lead to statistically significant effects. Dissecting phylogenetic from functional couplings might therefore extend the range of accurately predicted structural contacts down to smaller coupling strengths than those currently used.
Collapse
|
32
|
Luo R, Delaunay‐Moisan A, Timmis K, Danchin A. SARS-CoV-2 biology and variants: anticipation of viral evolution and what needs to be done. Environ Microbiol 2021; 23:2339-2363. [PMID: 33769683 PMCID: PMC8251359 DOI: 10.1111/1462-2920.15487] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 03/24/2021] [Indexed: 12/14/2022]
Abstract
The global propagation of SARS-CoV-2 and the detection of a large number of variants, some of which have replaced the original clade to become dominant, underscores the fact that the virus is actively exploring its evolutionary space. The longer high levels of viral multiplication occur - permitted by high levels of transmission -, the more the virus can adapt to the human host and find ways to success. The third wave of the COVID-19 pandemic is starting in different parts of the world, emphasizing that transmission containment measures that are being imposed are not adequate. Part of the consideration in determining containment measures is the rationale that vaccination will soon stop transmission and allow a return to normality. However, vaccines themselves represent a selection pressure for evolution of vaccine-resistant variants, so the coupling of a policy of permitting high levels of transmission/virus multiplication during vaccine roll-out with the expectation that vaccines will deal with the pandemic, is unrealistic. In the absence of effective antivirals, it is not improbable that SARS-CoV-2 infection prophylaxis will involve an annual vaccination campaign against 'dominant' viral variants, similar to influenza prophylaxis. Living with COVID-19 will be an issue of SARS-CoV-2 variants and evolution. It is therefore crucial to understand how SARS-CoV-2 evolves and what constrains its evolution, in order to anticipate the variants that will emerge. Thus far, the focus has been on the receptor-binding spike protein, but the virus is complex, encoding 26 proteins which interact with a large number of host factors, so the possibilities for evolution are manifold and not predictable a priori. However, if we are to mount the best defence against COVID-19, we must mount it against the variants, and to do this, we must have knowledge about the evolutionary possibilities of the virus. In addition to the generic cellular interactions of the virus, there are extensive polymorphisms in humans (e.g. Lewis, HLA, etc.), some distributed within most or all populations, some restricted to specific ethnic populations and these variations pose additional opportunities for/constraints on viral evolution. We now have the wherewithal - viral genome sequencing, protein structure determination/modelling, protein interaction analysis - to functionally characterize viral variants, but access to comprehensive genome data is extremely uneven. Yet, to develop an understanding of the impacts of such evolution on transmission and disease, we must link it to transmission (viral epidemiology) and disease data (patient clinical data), and the population granularities of these. In this editorial, we explore key facets of viral biology and the influence of relevant aspects of human polymorphisms, human behaviour, geography and climate and, based on this, derive a series of recommendations to monitor viral evolution and predict the types of variants that are likely to arise.
Collapse
Affiliation(s)
- Ruibang Luo
- Department of Computer ScienceThe University of Hong KongBonham RoadPokfulamHong Kong
| | - Agnès Delaunay‐Moisan
- Université Paris‐Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC)Gif‐sur‐Yvette91198France
| | - Kenneth Timmis
- Institute of MicrobiologyTechnical University of BraunschweigBraunschweigGermany
| | - Antoine Danchin
- Kodikos Labs, Institut Cochin, 24 rue du Faubourg Saint‐JacquesParis75014France
- School of Biomedical Sciences, Li Kashing Faculty of MedicineUniversity of Hong Kong21 Sassoon RoadHong Kong
| |
Collapse
|
33
|
Miao M, Clercq ED, Li G. Genetic Diversity of SARS-CoV-2 over a One-Year Period of the COVID-19 Pandemic: A Global Perspective. Biomedicines 2021; 9:biomedicines9040412. [PMID: 33920487 PMCID: PMC8069977 DOI: 10.3390/biomedicines9040412] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 03/26/2021] [Accepted: 04/07/2021] [Indexed: 02/08/2023] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a global pandemic of coronavirus disease in 2019 (COVID-19). Genome surveillance is a key method to track the spread of SARS-CoV-2 variants. Genetic diversity and evolution of SARS-CoV-2 were analyzed based on 260,673 whole-genome sequences, which were sampled from 62 countries between 24 December 2019 and 12 January 2021. We found that amino acid (AA) substitutions were observed in all SARS-CoV-2 proteins, and the top six proteins with the highest substitution rates were ORF10, nucleocapsid, ORF3a, spike glycoprotein, RNA-dependent RNA polymerase, and ORF8. Among 25,629 amino acid substitutions at 8484 polymorphic sites across the coding region of the SARS-CoV-2 genome, the D614G (93.88%) variant in spike and the P323L (93.74%) variant in RNA-dependent RNA polymerase were the dominant variants on six continents. As of January 2021, the genomic sequences of SARS-CoV-2 could be divided into at least 12 different clades. Distributions of SARS-CoV-2 clades were featured with temporal and geographical dynamics on six continents. Overall, this large-scale analysis provides a detailed mapping of SARS-CoV-2 variants in different geographic areas at different time points, highlighting the importance of evaluating highly prevalent variants in the development of SARS-CoV-2 antiviral drugs and vaccines.
Collapse
Affiliation(s)
- Miao Miao
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha 410078, China;
| | - Erik De Clercq
- Rega Institute for Medical Research, Department of Microbiology, Immunology and Transplantation, KU Leuven, Herestraat 49, B-3000 Leuven, Belgium;
| | - Guangdi Li
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha 410078, China;
- Correspondence: ; Tel.: +86-731-84805414
| |
Collapse
|
34
|
|