1
|
Ding X, Liu J, Jiang T, Wu A. Transmission restriction and genomic evolution co-shape the genetic diversity patterns of influenza A virus. Virol Sin 2024:S1995-820X(24)00025-7. [PMID: 38423254 DOI: 10.1016/j.virs.2024.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
Influenza A virus (IAV) shows an extensive host range and rapid genomic variations, leading to continuous emergence of novel viruses with significant antigenic variations and the potential for cross-species transmission. This causes global pandemics and seasonal flu outbreaks, posing sustained threats worldwide. Thus, studying all IAVs' evolutionary patterns and underlying mechanisms is crucial for effective prevention and control. We developed FluTyping to identify IAV genotypes, to explore overall genetic diversity patterns and their restriction factors. FluTyping groups isolates based on genetic distance and phylogenetic relationships using entire genomes, enabling identification of each isolate's genotype. Three distinct genetic diversity patterns were observed: one genotype domination pattern comprising only H1N1 and H3N2 seasonal influenza subtypes, multi-genotypes co-circulation pattern including majority avian influenza subtypes and swine influenza H1N2, and hybrid-circulation pattern involving H7N9 and three H5 subtypes of influenza viruses. Furthermore, the IAVs in multi-genotypes co-circulation pattern showed region-specific dominant genotypes, implying the restriction of virus transmission is a key factor contributing to distinct genetic diversity patterns, and the genomic evolution underlying different patterns showed more influenced by host-specific factors. In summary, a comprehensive picture of the evolutionary patterns of overall IAVs is provided by the FluTyping's identified genotypes, offering important theoretical foundations for future prevention and control of these viruses.
Collapse
Affiliation(s)
- Xiao Ding
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, 215123, China; Key Laboratory of Pathogen Infection Prevention and Control (Peking Union Medical College), Ministry of Education, Beijing, 100730, China
| | - Jingze Liu
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, 215123, China; Key Laboratory of Pathogen Infection Prevention and Control (Peking Union Medical College), Ministry of Education, Beijing, 100730, China
| | - Taijiao Jiang
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, 215123, China; Guangzhou National Laboratory, Guangzhou, 510006, China; State Key Laboratory of Respiratory Disease, The Key Laboratory of Advanced Interdisciplinary Studies Center, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510030, China.
| | - Aiping Wu
- State Key Laboratory of Common Mechanism Research for Major Diseases, Suzhou Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, 215123, China; Key Laboratory of Pathogen Infection Prevention and Control (Peking Union Medical College), Ministry of Education, Beijing, 100730, China.
| |
Collapse
|
2
|
Ma B, Gong H, Xu Q, Gao Y, Guan A, Wang H, Hua K, Luo R, Jin H. Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) enables precise and efficient phylogenetic estimation in viruses. Virus Evol 2024; 10:veae005. [PMID: 38361823 PMCID: PMC10868571 DOI: 10.1093/ve/veae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/17/2024] Open
Abstract
Understanding phylogenetic relationships among species is essential for many biological studies, which call for an accurate phylogenetic tree to understand major evolutionary transitions. The phylogenetic analyses present a major challenge in estimation accuracy and computational efficiency, especially recently facing a wave of severe emerging infectious disease outbreaks. Here, we introduced a novel, efficient framework called Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) for new sample placement for viruses. In this study, a brand-new recoding method called Frequency Vector Recoding was implemented to approximate the phylogenetic distance, and the Phylogenetic Simulated Annealing Search algorithm was developed to match the recoded distance matrix with the phylogenetic tree. Meanwhile, the indel (insertion/deletion) was heuristically introduced to foreign sequence recognition for the first time. Here, we compared the Bd-RPC with the recent placement software (PAGAN2, EPA-ng, TreeBeST) and evaluated it in Alphacoronavirus, Alphaherpesvirinae, and Betacoronavirus by using Split and Robinson-Foulds distances. The comparisons showed that Bd-RPC maintained the highest precision with great efficiency, demonstrating good performance in new sample placement on all three virus genera. Finally, a user-friendly website (http://www.bd-rpc.xyz) is available for users to classify new samples instantly and facilitate exploration of the phylogenetic research in viruses, and the Bd-RPC is available on GitHub (http://github.com/Bin-Ma/bd-rpc).
Collapse
Affiliation(s)
- Bin Ma
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Huimin Gong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Qianshuai Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Yuan Gao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Aohan Guan
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Haoyu Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Kexin Hua
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Rui Luo
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Hui Jin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| |
Collapse
|
3
|
Ortiz-Velez AN, Sukumaran J, Rouzbehani R, Kelley ST. AutoPhy: Automated phylogenetic identification of novel protein subfamilies. PLoS One 2024; 19:e0291801. [PMID: 38206953 PMCID: PMC10783759 DOI: 10.1371/journal.pone.0291801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 09/06/2023] [Indexed: 01/13/2024] Open
Abstract
Phylogenetic analysis of protein sequences provides a powerful means of identifying novel protein functions and subfamilies, and for identifying and resolving annotation errors. However, automation of functional clustering based on phylogenetic trees has been challenging and most of it is done manually. Clustering phylogenetic trees usually requires the delineation of tree-based thresholds (e.g., distances), leading to an ad hoc problem. We propose a new phylogenetic clustering approach that identifies clusters without using ad hoc distances or other pre-defined values. Our workflow combines uniform manifold approximation and projection (UMAP) with Gaussian mixture models as a k-means like procedure to automatically group sequences into clusters. We then apply a "second pass" clade identification algorithm to resolve non-monophyletic groups. We tested our approach with several well-curated protein families (outer membrane porins, acyltransferase, and nuclear receptors) and showed our automated methods recapitulated known subfamilies. We also applied our methods to a broad range of different protein families from multiple databases, including Pfam, PANTHER, and UniProt, and to alignments of RNA viral genomes. Our results showed that AutoPhy rapidly generated monophyletic clusters (subfamilies) within phylogenetic trees evolving at very different rates both within and among phylogenies. The phylogenetic clusters generated by AutoPhy resolved misannotations and identified new protein functional groups and novel viral strains.
Collapse
Affiliation(s)
- Adrian N Ortiz-Velez
- Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA, United States of America
- Department of Biology, San Diego State University, San Diego, CA, United States of America
| | - Jeet Sukumaran
- Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA, United States of America
- Department of Biology, San Diego State University, San Diego, CA, United States of America
| | - Ryin Rouzbehani
- Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA, United States of America
| | - Scott T Kelley
- Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA, United States of America
- Department of Biology, San Diego State University, San Diego, CA, United States of America
| |
Collapse
|
4
|
Mandal M, Mandal S. Spatiotemporal genome diversity of SARS-CoV-2 in wastewater: a two-year global epidemiological study. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 196:44. [PMID: 38102322 DOI: 10.1007/s10661-023-12228-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023]
Abstract
Wastewater surveillance locally and globally is important for the investigation of the molecular epidemiological features of SARS-CoV-2 in the environment. The current study investigated the genomic diversity and mutation profile of SARS-CoV-2 variants in wastewater for the period spanning COVID-19 pandemic up to December, 2022. A total of 3618 complete SARS-CoV-2 genome sequences from waste water samples submitted to the GISAID database were retrieved. The SARS-CoV-2 sequences were subjected to pairwise alignment against reference, followed by clade and lineage assignment (based on Nextstrain, GISAID and Pango), distance metric phylogenetic analysis, and detection of substitution mutations. Following GISAID, Nextstrain, and Pango nomenclatures, an overall agreement in clade and lineage determination in wastewater samples was observed. There was successive appearance, dissemination, and disappearance of SARS-CoV-2 lineages along time in wastewater. The SARS-CoV-2 genomes from wastewater were clustered into the variants of concern (VOC) as Alpha GRY (B.1.1.7 + Q.7), Delta GK (B.1.617.2 + AY.*), and Omicron GRA (BA.1*, BA.2* + B.1.1.529, BA.5*). The evolutionary rate was 9.63e-04 substitutions/site/year for SARS-CoV-2 in wastewater. B.1.1.7 was less prevalent than B.1.617.2 in 2021, appeared in succession, and BA.1, BA.2, BA.5 were serially detected in 2022, the latter strain continued to persist in wastewater. The N501Y, E484K/Q, K417N/T, L452R, T478K spike substitutions remained dominant attribute of SARS-CoV-2 VOCs. The study underlines the importance of wastewater surveillance for enumerating spatiotemporal diversity of SARS-CoV-2 variants and mutations, which might pave the way for novel antiviral and vaccine designing towards management and prevention of SARS-CoV-2 infection.
Collapse
Affiliation(s)
- Manisha Mandal
- Department of Physiology, MGM Medical College, Kishanganj, 855107, India
| | - Shyamapada Mandal
- Department of Zoology, University of Gour Banga, Malda, 732103, West Bengal, India.
| |
Collapse
|
5
|
Meiseles A, Motro Y, Rokach L, Moran-Gilad J. Vulnerability of pangolin SARS-CoV-2 lineage assignment to adversarial attack. Artif Intell Med 2023; 146:102722. [PMID: 38042605 DOI: 10.1016/j.artmed.2023.102722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/14/2023] [Accepted: 11/14/2023] [Indexed: 12/04/2023]
Abstract
Pangolin is the most popular tool for SARS-CoV-2 lineage assignment. During COVID-19, healthcare professionals and policymakers required accurate and timely lineage assignment of SARS-CoV-2 genomes for pandemic response. Therefore, tools such as Pangolin use a machine learning model, pangoLEARN, for fast and accurate lineage assignment. Unfortunately, machine learning models are susceptible to adversarial attacks, in which minute changes to the inputs cause substantial changes in the model prediction. We present an attack that uses the pangoLEARN architecture to find perturbations that change the lineage assignment, often with only 2-3 base pair changes. The attacks we carried out show that pangolin is vulnerable to adversarial attack, with success rates between 0.98 and 1 for sequences from non-VoC lineages when pangoLEARN is used for lineage assignment. The attacks we carried out are almost never successful against VoC lineages because pangolin uses Usher and Scorpio - the non-machine-learning alternative methods for VoC lineage assignment. A malicious agent could use the proposed attack to fake or mask outbreaks or circulating lineages. Developers of software in the field of microbial genomics should be aware of the vulnerabilities of machine learning based models and mitigate such risks.
Collapse
Affiliation(s)
- Amiel Meiseles
- Dept. of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva, Israel
| | - Yair Motro
- Dept. of Health Policy and Management, School of Public Health, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, Israel
| | - Lior Rokach
- Dept. of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva, Israel
| | - Jacob Moran-Gilad
- Dept. of Health Policy and Management, School of Public Health, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, Israel.
| |
Collapse
|
6
|
Gomez-Romero N, Basurto-Alcantara FJ, Velazquez-Salinas L. Assessing the Potential Role of Cats ( Felis catus) as Generators of Relevant SARS-CoV-2 Lineages during the Pandemic. Pathogens 2023; 12:1361. [PMID: 38003825 PMCID: PMC10675002 DOI: 10.3390/pathogens12111361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
Several questions regarding the evolution of SARS-CoV-2 remain poorly elucidated. One of these questions is the possible evolutionary impact of SARS-CoV-2 after the infection in domestic animals. In this study, we aimed to evaluate the potential role of cats as generators of relevant SARS-CoV-2 lineages during the pandemic. A total of 105 full-length genome viral sequences obtained from naturally infected cats during the pandemic were evaluated by distinct evolutionary algorithms. Analyses were enhanced, including a set of highly related SARS-CoV-2 sequences recovered from human populations. Our results showed the apparent high susceptibility of cats to the infection SARS-CoV-2 compared with other animal species. Evolutionary analyses indicated that the phylogenomic characteristics displayed by cat populations were influenced by the dominance of specific SARS-CoV-2 genetic groups affecting human populations. However, disparate dN/dS rates at some genes between populations recovered from cats and humans suggested that infection in these two species may suggest a different evolutionary constraint for SARS-CoV-2. Interestingly, the branch selection analysis showed evidence of the potential role of natural selection in the emergence of five distinct cat lineages during the pandemic. Although these lineages were apparently irrelevant to public health during the pandemic, our results suggested that additional studies are needed to understand the role of other animal species in the evolution of SARS-CoV-2 during the pandemic.
Collapse
Affiliation(s)
- Ninnet Gomez-Romero
- Comisión México-Estados Unidos para la Prevención de Fiebre Aftosa y Otras Enfermedades Exóticas de los Animales, Carretera Mexico-Toluca Km 15.5 Piso 4 Col. Palo Alto, Cuajimalpa de Morelos, Mexico City 05110, Mexico;
- Departamento de Microbiología e Inmunología, Facultad de Medicina Veterinaria y Zootecnia, Universidad Nacional Autónoma de México, Av. Universidad No. 3000 Col Copilco Universidad, Mexico City 14510, Mexico;
| | - Francisco Javier Basurto-Alcantara
- Departamento de Microbiología e Inmunología, Facultad de Medicina Veterinaria y Zootecnia, Universidad Nacional Autónoma de México, Av. Universidad No. 3000 Col Copilco Universidad, Mexico City 14510, Mexico;
| | - Lauro Velazquez-Salinas
- Plum Island Animal Disease Center, Agricultural Research Service, United States Department of Agriculture, Greenport, NY 11944, USA
- National Bio and Agro-Defense Facility (NBAF), Agricultural Research Service, United States Department of Agriculture, Manhattan, KS 66502, USA
| |
Collapse
|
7
|
Markin A, Wagle S, Grover S, Vincent Baker AL, Eulenstein O, Anderson TK. PARNAS: Objectively Selecting the Most Representative Taxa on a Phylogeny. Syst Biol 2023; 72:1052-1063. [PMID: 37208300 PMCID: PMC10627562 DOI: 10.1093/sysbio/syad028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 04/26/2023] [Accepted: 05/03/2023] [Indexed: 05/21/2023] Open
Abstract
The use of next-generation sequencing technology has enabled phylogenetic studies with hundreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, detailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires objective subsampling of taxa. To address this need, we propose parnas, an objective and flexible algorithm to sample and select taxa that best represent observed diversity by solving a generalized k-medoids problem on a phylogenetic tree. parnas solves this problem efficiently and exactly by novel optimizations and adapting algorithms from operations research. For more nuanced selections, taxa can be weighted with metadata or genetic sequence parameters, and the pool of potential representatives can be user-constrained. Motivated by influenza A virus genomic surveillance and vaccine design, parnas can be applied to identify representative taxa that optimally cover the diversity in a phylogeny within a specified distance radius. We demonstrated that parnas is more efficient and flexible than existing approaches. To demonstrate its utility, we applied parnas to 1) quantify SARS-CoV-2 genetic diversity over time, 2) select representative influenza A virus in swine genes derived from over 5 years of genomic surveillance data, and 3) identify gaps in H3N2 human influenza A virus vaccine coverage. We suggest that our method, through the objective selection of representatives in a phylogeny, provides criteria for quantifying genetic diversity that has application in the the rational design of multivalent vaccines and genomic epidemiology. PARNAS is available at https://github.com/flu-crew/parnas.
Collapse
Affiliation(s)
- Alexey Markin
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| | - Sanket Wagle
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA
| | - Siddhant Grover
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA
| | - Amy L Vincent Baker
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA, 50011, USA
| | - Tavis K Anderson
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| |
Collapse
|
8
|
Azuero OC, Lefrancq N, Nikolay B, McKee C, Cappelle J, Hul V, Ou TP, Hoem T, Lemey P, Rahman MZ, Islam A, Gurley ES, Duong V, Salje H. The genetic diversity of Nipah virus across spatial scales. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.14.23292668. [PMID: 37502973 PMCID: PMC10370237 DOI: 10.1101/2023.07.14.23292668] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Nipah virus (NiV), a highly lethal virus in humans, circulates silently in Pteropus bats throughout South and Southeast Asia. Difficulty in obtaining genomes from bats means we have a poor understanding of NiV diversity, including how many lineages circulate within a roost and the spread of NiV over increasing spatial scales. Here we develop phylogenetic approaches applied to the most comprehensive collection of genomes to date (N=257, 175 from bats, 73 from humans) from six countries over 22 years (1999-2020). In Bangladesh, where most human infections occur, we find evidence of increased spillover risk from one of the two co-circulating sublineages. We divide the four major NiV sublineages into 15 genetic clusters (emerged 20-44 years ago). Within any bat roost, there are an average of 2.4 co-circulating genetic clusters, rising to 5.5 clusters at areas of 1,500-2,000 km2. Using Approximate Bayesian Computation fit to a spatial signature of viral diversity, we estimate that each genetic cluster occupies an average area of 1.3 million km2 (95%CI: 0.6-2.3 million), with 14 clusters in an area of 100,000 km2 (95%CI: 6-24). In the few sites in Bangladesh and Cambodia where genomic surveillance has been concentrated, we estimate that most of the genetic clusters have been identified, but only ~15% of overall NiV diversity has been uncovered. Our findings are consistent with entrenched co-circulation of distinct lineages, even within individual roosts, coupled with slow migration over larger spatial scales.
Collapse
Affiliation(s)
| | - Noémie Lefrancq
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| | | | - Clifton McKee
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Vibol Hul
- Virology Unit, Institut Pasteur du Cambodge, Pasteur Network, Phnom Penh 12201, Cambodia
| | - Tey Putita Ou
- Virology Unit, Institut Pasteur du Cambodge, Pasteur Network, Phnom Penh 12201, Cambodia
| | - Thavry Hoem
- Virology Unit, Institut Pasteur du Cambodge, Pasteur Network, Phnom Penh 12201, Cambodia
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, KU Leuven, BE-3000 Leuven, Belgium
| | | | - Ausraful Islam
- Infectious Diseases Division, icddr,b, Dhaka 1000, Bangladesh
| | - Emily S. Gurley
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Veasna Duong
- Virology Unit, Institut Pasteur du Cambodge, Pasteur Network, Phnom Penh 12201, Cambodia
| | - Henrik Salje
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK
| |
Collapse
|
9
|
Ren H, Ling Y, Cao R, Wang Z, Li Y, Huang T. Early warning of emerging infectious diseases based on multimodal data. BIOSAFETY AND HEALTH 2023; 5:S2590-0536(23)00074-5. [PMID: 37362865 PMCID: PMC10245235 DOI: 10.1016/j.bsheal.2023.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/18/2023] [Accepted: 05/31/2023] [Indexed: 06/28/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several databases containing virus information. Several scientists have integrated existing data on viruses to construct phylogenetic trees and predict virus mutation and transmission in different ways, providing prospective technical support for epidemic prevention and control. This review summarized the databases of known emerging infectious viruses and techniques focusing on virus variant forecasting and early warning. It focuses on the multi-dimensional information integration and database construction of emerging infectious viruses, virus mutation spectrum construction and variant forecast model, analysis of the affinity between mutation antigen and the receptor, propagation model of virus dynamic evolution, and monitoring and early warning for variants. As people have suffered from COVID-19 and repeated flu outbreaks, we focused on the research results of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses. This review comprehensively viewed the latest virus research and provided a reference for future virus prevention and control research.
Collapse
Affiliation(s)
- Haotian Ren
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yunchao Ling
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ruifang Cao
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zhen Wang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024 China
- Guangzhou Laboratory, Guangzhou 510005, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200433, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
10
|
Fumagalli SE, Padhiar NH, Meyer D, Katneni U, Bar H, DiCuccio M, Komar AA, Kimchi-Sarfaty C. Analysis of 3.5 million SARS-CoV-2 sequences reveals unique mutational trends with consistent nucleotide and codon frequencies. Virol J 2023; 20:31. [PMID: 36812119 PMCID: PMC9936480 DOI: 10.1186/s12985-023-01982-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 02/02/2023] [Indexed: 02/19/2023] Open
Abstract
BACKGROUND Since the onset of the SARS-CoV-2 pandemic, bioinformatic analyses have been performed to understand the nucleotide and synonymous codon usage features and mutational patterns of the virus. However, comparatively few have attempted to perform such analyses on a considerably large cohort of viral genomes while organizing the plethora of available sequence data for a month-by-month analysis to observe changes over time. Here, we aimed to perform sequence composition and mutation analysis of SARS-CoV-2, separating sequences by gene, clade, and timepoints, and contrast the mutational profile of SARS-CoV-2 to other comparable RNA viruses. METHODS Using a cleaned, filtered, and pre-aligned dataset of over 3.5 million sequences downloaded from the GISAID database, we computed nucleotide and codon usage statistics, including calculation of relative synonymous codon usage values. We then calculated codon adaptation index (CAI) changes and a nonsynonymous/synonymous mutation ratio (dN/dS) over time for our dataset. Finally, we compiled information on the types of mutations occurring for SARS-CoV-2 and other comparable RNA viruses, and generated heatmaps showing codon and nucleotide composition at high entropy positions along the Spike sequence. RESULTS We show that nucleotide and codon usage metrics remain relatively consistent over the 32-month span, though there are significant differences between clades within each gene at various timepoints. CAI and dN/dS values vary substantially between different timepoints and different genes, with Spike gene on average showing both the highest CAI and dN/dS values. Mutational analysis showed that SARS-CoV-2 Spike has a higher proportion of nonsynonymous mutations than analogous genes in other RNA viruses, with nonsynonymous mutations outnumbering synonymous ones by up to 20:1. However, at several specific positions, synonymous mutations were overwhelmingly predominant. CONCLUSIONS Our multifaceted analysis covering both the composition and mutation signature of SARS-CoV-2 gives valuable insight into the nucleotide frequency and codon usage heterogeneity of SARS-CoV-2 over time, and its unique mutational profile compared to other RNA viruses.
Collapse
Affiliation(s)
- Sarah E Fumagalli
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Nigam H Padhiar
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Douglas Meyer
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Upendra Katneni
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Haim Bar
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | | | - Anton A Komar
- Department of Biological, Geological and Environmental Sciences, Center for Gene Regulation in Health and Disease, Cleveland State University, Cleveland, OH, USA
| | - Chava Kimchi-Sarfaty
- Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.
| |
Collapse
|
11
|
Huang Q, Qiu H, Bible PW, Huang Y, Zheng F, Gu J, Sun J, Hao Y, Liu Y. Early detection of SARS-CoV-2 variants through dynamic co-mutation network surveillance. Front Public Health 2023; 11:1015969. [PMID: 36755900 PMCID: PMC9901361 DOI: 10.3389/fpubh.2023.1015969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/02/2023] [Indexed: 01/25/2023] Open
Abstract
Background Precise public health and clinical interventions for the COVID-19 pandemic has spurred a global rush on SARS-CoV-2 variant tracking, but current approaches to variant tracking are challenged by the flood of viral genome sequences leading to a loss of timeliness, accuracy, and reliability. Here, we devised a new co-mutation network framework, aiming to tackle these difficulties in variant surveillance. Methods To avoid simultaneous input and modeling of the whole large-scale data, we dynamically investigate the nucleotide covarying pattern of weekly sequences. The community detection algorithm is applied to a co-occurring genomic alteration network constructed from mutation corpora of weekly collected data. Co-mutation communities are identified, extracted, and characterized as variant markers. They contribute to the creation and weekly updates of a community-based variant dictionary tree representing SARS-CoV-2 evolution, where highly similar ones between weeks have been merged to represent the same variants. Emerging communities imply the presence of novel viral variants or new branches of existing variants. This process was benchmarked with worldwide GISAID data and validated using national level data from six COVID-19 hotspot countries. Results A total of 235 co-mutation communities were identified after a 120 weeks' investigation of worldwide sequence data, from March 2020 to mid-June 2022. The dictionary tree progressively developed from these communities perfectly recorded the time course of SARS-CoV-2 branching, coinciding with GISAID clades. The time-varying prevalence of these communities in the viral population showed a good match with the emergence and circulation of the variants they represented. All these benchmark results not only exhibited the methodology features but also demonstrated high efficiency in detection of the pandemic variants. When it was applied to regional variant surveillance, our method displayed significantly earlier identification of feature communities of major WHO-named SARS-CoV-2 variants in contrast with Pangolin's monitoring. Conclusion An efficient genomic surveillance framework built from weekly co-mutation networks and a dynamic community-based variant dictionary tree enables early detection and continuous investigation of SARS-CoV-2 variants overcoming genomic data flood, aiding in the response to the COVID-19 pandemic.
Collapse
Affiliation(s)
- Qiang Huang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Huining Qiu
- Guangdong Artificial Intelligence Machine Vision Engineering Technology Research Center, Guangzhou, China
| | - Paul W. Bible
- College of Arts and Sciences, Marian University, Indianapolis, IN, United States
| | - Yong Huang
- Institute of Public Health, Guangzhou Medical University & Guangzhou Center for Disease Control and Prevention, Guangzhou, China
| | - Fangfang Zheng
- School of Traditional Chinese Medicine Healthcare, Guangdong Food and Drug Vocational College, Guangzhou, China
| | - Jing Gu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Jian Sun
- Department of Clinical Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China,*Correspondence: Jian Sun ✉
| | - Yuantao Hao
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China,Yuantao Hao ✉
| | - Yu Liu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China,Yu Liu ✉
| |
Collapse
|
12
|
Santos PD, Günther A, Keller M, Homeier-Bachmann T, Groschup MH, Beer M, Höper D, Ziegler U. An advanced sequence clustering and designation workflow reveals the enzootic maintenance of a dominant West Nile virus subclade in Germany. Virus Evol 2023; 9:vead013. [PMID: 37197362 PMCID: PMC10184446 DOI: 10.1093/ve/vead013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 01/13/2023] [Accepted: 03/16/2023] [Indexed: 05/19/2023] Open
Abstract
West Nile virus (WNV) is the most widespread arthropod-borne (arbo) virus and the primary cause of arboviral encephalitis globally. Members of WNV species genetically diverged and are classified into different hierarchical groups below species rank. However, the demarcation criteria for allocating WNV sequences into these groups remain individual and inconsistent, and the use of names for different levels of the hierarchical levels is unstructured. In order to have an objective and comprehensible grouping of WNV sequences, we developed an advanced grouping workflow using the 'affinity propagation clustering' algorithm and newly included the 'agglomerative hierarchical clustering' algorithm for the allocation of WNV sequences into different groups below species rank. In addition, we propose to use a fixed set of terms for the hierarchical naming of WNV below species level and a clear decimal numbering system to label the determined groups. For validation, we applied the refined workflow to WNV sequences that have been previously grouped into various lineages, clades, and clusters in other studies. Although our workflow regrouped some WNV sequences, overall, it generally corresponds with previous groupings. We employed our novel approach to the sequences from the WNV circulation in Germany 2020, primarily from WNV-infected birds and horses. Besides two newly defined minor (sub)clusters comprising only three sequences each, Subcluster 2.5.3.4.3c was the predominant WNV sequence group detected in Germany from 2018 to 2020. This predominant subcluster was also associated with at least five human WNV infections in 2019-20. In summary, our analyses imply that the genetic diversity of the WNV population in Germany is shaped by enzootic maintenance of the dominant WNV subcluster accompanied by sporadic incursions of other rare clusters and subclusters. Moreover, we show that our refined approach for sequence grouping yields meaningful results. Although we primarily aimed at a more detailed WNV classification, the presented workflow can also be applied to the objective genotyping of other virus species.
Collapse
Affiliation(s)
| | | | - Markus Keller
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Novel and Emerging Infectious Diseases, 17493, Greifswald-Insel Riems, Germany
| | | | - Martin H Groschup
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Novel and Emerging Infectious Diseases, 17493, Greifswald-Insel Riems, Germany
- German Centre for Infection Research, Partner site Hamburg-Lübeck-Borstel-Riems, 17493, Greifswald-Insel Riems, Germany
| | - Martin Beer
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Diagnostic Virology, 17493, Greifswald-Insel Riems, Germany
| | | | | |
Collapse
|
13
|
Ji C, Han N, Cheng Y, Shang J, Weng S, Yang R, Zhou HY, Wu A. sitePath: a visual tool to identify polymorphism clades and help find fixed and parallel mutations. BMC Bioinformatics 2022; 23:504. [PMID: 36434502 PMCID: PMC9701067 DOI: 10.1186/s12859-022-05064-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 11/16/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Identifying polymorphism clades on phylogenetic trees could help detect punctual mutations that are associated with viral functions. With visualization tools coloring the tree, it is easy to visually find clades where most sequences have the same polymorphism state. However, with the fast accumulation of viral sequences, a computational tool to automate this process is urgently needed. RESULTS Here, by implementing a branch-and-bound-like search method, we developed an R package named sitePath to identify polymorphism clades automatically. Based on the identified polymorphism clades, fixed and parallel mutations could be inferred. Furthermore, sitePath also integrated visualization tools to generate figures of the calculated results. In an example with the influenza A virus H3N2 dataset, the detected fixed mutations coincide with antigenic shift mutations. The highly specificity and sensitivity of sitePath in finding fixed mutations were achieved for a range of parameters and different phylogenetic tree inference software. CONCLUSIONS The result suggests that sitePath can identify polymorphism clades per site. The clustering of sequences on a phylogenetic tree can be used to infer fixed and parallel mutations. High-quality figures of the calculated results could also be generated by sitePath.
Collapse
Affiliation(s)
- Chengyang Ji
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| | - Na Han
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| | - Yexiao Cheng
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China ,grid.254147.10000 0000 9776 7793School of Life Science and Technology, China Pharmaceutical University, Nanjing, 211100 China
| | - Jingzhe Shang
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| | - Shenghui Weng
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| | - Rong Yang
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| | - Hang-Yu Zhou
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| | - Aiping Wu
- grid.506261.60000 0001 0706 7839Institute of Systems Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100005 China ,grid.494590.5Suzhou Institute of Systems Medicine, Suzhou, 215123 Jiangsu China
| |
Collapse
|
14
|
Miao M, De Clercq E, Li G. Towards Efficient and Accurate SARS-CoV-2 Genome Sequence Typing Based on Supervised Learning Approaches. Microorganisms 2022; 10:microorganisms10091785. [PMID: 36144387 PMCID: PMC9505117 DOI: 10.3390/microorganisms10091785] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 08/24/2022] [Accepted: 09/01/2022] [Indexed: 11/16/2022] Open
Abstract
Despite the active development of SARS-CoV-2 surveillance methods (e.g., Nextstrain, GISAID, Pangolin), the global emergence of various SARS-CoV-2 viral lineages that potentially cause antiviral and vaccine failure has driven the need for accurate and efficient SARS-CoV-2 genome sequence classifiers. This study presents an optimized method that accurately identifies the viral lineages of SARS-CoV-2 genome sequences using existing schemes. For Nextstrain and GISAID clades, a template matching-based method is proposed to quantify the differences between viral clades and to play an important role in classification evaluation. Furthermore, to improve the typing accuracy of SARS-CoV-2 genome sequences, an ensemble model that integrates a combination of machine learning-based methods (such as Random Forest and Catboost) with optimized weights is proposed for Nextstrain, Pangolin, and GISAID clades. Cross-validation is applied to optimize the parameters of the machine learning-based method and the weight settings of the ensemble model. To improve the efficiency of the model, in addition to the one-hot encoding method, we have proposed a nucleotide site mutation-based data structure that requires less computational resources and performs better in SARS-CoV-2 genome sequence typing. Based on an accumulated database of >1 million SARS-CoV-2 genome sequences, performance evaluations show that the proposed system has a typing accuracy of 99.879%, 97.732%, and 96.291% for Nextstrain, Pangolin, and GISAID clades, respectively. A single prediction only takes an average of <20 ms on a portable laptop. Overall, this study provides an efficient and accurate SARS-CoV-2 genome sequence typing system that benefits current and future surveillance of SARS-CoV-2 variants.
Collapse
Affiliation(s)
- Miao Miao
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha 410078, China
| | - Erik De Clercq
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven, 3000 Leuven, Belgium
| | - Guangdi Li
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, Changsha 410078, China
- Hunan Children’s Hospital, Changsha 410007, China
- Correspondence: ; Tel.: +86-731-8480-5414
| |
Collapse
|
15
|
Nabakooza G, Galiwango R, Frost SDW, Kateete DP, Kitayimbwa JM. Molecular Epidemiology and Evolutionary Dynamics of Human Influenza Type-A Viruses in Africa: A Systematic Review. Microorganisms 2022; 10:900. [PMID: 35630344 PMCID: PMC9145646 DOI: 10.3390/microorganisms10050900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 03/21/2022] [Accepted: 03/22/2022] [Indexed: 02/01/2023] Open
Abstract
Genomic characterization of circulating influenza type-A viruses (IAVs) directs the selection of appropriate vaccine formulations and early detection of potentially pandemic virus strains. However, longitudinal data on the genomic evolution and transmission of IAVs in Africa are scarce, limiting Africa's benefits from potential influenza control strategies. We searched seven databases: African Journals Online, Embase, Global Health, Google Scholar, PubMed, Scopus, and Web of Science according to the PRISMA guidelines for studies that sequenced and/or genomically characterized Africa IAVs. Our review highlights the emergence and diversification of IAVs in Africa since 1993. Circulating strains continuously acquired new amino acid substitutions at the major antigenic and potential N-linked glycosylation sites in their hemagglutinin proteins, which dramatically affected vaccine protectiveness. Africa IAVs phylogenetically mixed with global strains forming strong temporal and geographical evolution structures. Phylogeographic analyses confirmed that viral migration into Africa from abroad, especially South Asia, Europe, and North America, and extensive local viral mixing sustained the genomic diversity, antigenic drift, and persistence of IAVs in Africa. However, the role of reassortment and zoonosis remains unknown. Interestingly, we observed substitutions and clades and persistent viral lineages unique to Africa. Therefore, Africa's contribution to the global influenza ecology may be understated. Our results were geographically biased, with data from 63% (34/54) of African countries. Thus, there is a need to expand influenza surveillance across Africa and prioritize routine whole-genome sequencing and genomic analysis to detect new strains early for effective viral control.
Collapse
Affiliation(s)
- Grace Nabakooza
- Department of Immunology and Molecular Biology, Makerere University, Old Mulago Hill Road, P.O. Box 7072, Kampala 256, Uganda;
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe 256, Uganda; (R.G.); (J.M.K.)
| | - Ronald Galiwango
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe 256, Uganda; (R.G.); (J.M.K.)
- Centre for Computational Biology, Uganda Christian University, Plot 67-173, Bishop Tucker Road, P.O. Box 4, Mukono 256, Uganda
- African Center of Excellence in Bioinformatics and Data Intensive Sciences, Infectious Diseases Institute, Makerere University, Kampala 256, Uganda
| | - Simon D. W. Frost
- Microsoft Research, Redmond, 14820 NE 36th Street, Washington, DC 98052, USA;
- London School of Hygiene & Tropical Medicine (LSHTM), University of London, Keppel Street, Bloomsbury, London WC1E7HT, UK
| | - David P. Kateete
- Department of Immunology and Molecular Biology, Makerere University, Old Mulago Hill Road, P.O. Box 7072, Kampala 256, Uganda;
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe 256, Uganda; (R.G.); (J.M.K.)
| | - John M. Kitayimbwa
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe 256, Uganda; (R.G.); (J.M.K.)
- Centre for Computational Biology, Uganda Christian University, Plot 67-173, Bishop Tucker Road, P.O. Box 4, Mukono 256, Uganda
| |
Collapse
|
16
|
Huang Q, Zhang Q, Bible PW, Liang Q, Zheng F, Wang Y, Hao Y, Liu Y. A New Way to Trace SARS-CoV-2 Variants Through Weighted Network Analysis of Frequency Trajectories of Mutations. Front Microbiol 2022; 13:859241. [PMID: 35369526 PMCID: PMC8966897 DOI: 10.3389/fmicb.2022.859241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 02/18/2022] [Indexed: 11/13/2022] Open
Abstract
Early detection of SARS-CoV-2 variants enables timely tracking of clinically important strains in order to inform the public health response. Current subtype-based variant surveillance depending on prior subtype assignment according to lag features and their continuous risk assessment may delay this process. We proposed a weighted network framework to model the frequency trajectories of mutations (FTMs) for SARS-CoV-2 variant tracing, without requiring prior subtype assignment. This framework modularizes the FTMs and conglomerates synchronous FTMs together to represent the variants. It also generates module clusters to unveil the epidemic stages and their contemporaneous variants. Eventually, the module-based variants are assessed by phylogenetic tree through sub-sampling to facilitate communication and control of the epidemic. This process was benchmarked using worldwide GISAID data, which not only demonstrated all the methodology features but also showed the module-based variant identification had highly specific and sensitive mapping with the global phylogenetic tree. When applying this process to regional data like India and South Africa for SARS-CoV-2 variant surveillance, the approach clearly elucidated the national dispersal history of the viral variants and their co-circulation pattern, and provided much earlier warning of Beta (B.1.351), Delta (B.1.617.2), and Omicron (B.1.1.529). In summary, our work showed that the weighted network modeling of FTMs enables us to rapidly and easily track down SARS-CoV-2 variants overcoming prior viral subtyping with lag features, accelerating the understanding and surveillance of COVID-19.
Collapse
Affiliation(s)
- Qiang Huang
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Qiang Zhang
- College of Computer, Chengdu University, Chengdu, China
| | - Paul W Bible
- College of Arts and Sciences, Marian University, Indianapolis, IN, United States
| | - Qiaoxing Liang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Fangfang Zheng
- School of Traditional Chinese Medicine Healthcare, Guangdong Food and Drug Vocational College, Guangzhou, China
| | - Ying Wang
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yuantao Hao
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yu Liu
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
17
|
SARS-CoV-2: Evolution and Emergence of New Viral Variants. Viruses 2022; 14:v14040653. [PMID: 35458383 PMCID: PMC9025907 DOI: 10.3390/v14040653] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/14/2022] [Accepted: 03/16/2022] [Indexed: 12/15/2022] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the etiological agent responsible for the coronavirus disease 2019 (COVID-19). The high rate of mutation of this virus is associated with a quick emergence of new viral variants that have been rapidly spreading worldwide. Several mutations have been documented in the receptor-binding domain (RBD) of the viral spike protein that increases the interaction between SARS-CoV-2 and its cellular receptor, the angiotensin-converting enzyme 2 (ACE2). Mutations in the spike can increase the viral spread rate, disease severity, and the ability of the virus to evade either the immune protective responses, monoclonal antibody treatments, or the efficacy of current licensed vaccines. This review aimed to highlight the functional virus classification used by the World Health Organization (WHO), Phylogenetic Assignment of Named Global Outbreak (PANGO), Global Initiative on Sharing All Influenza Data (GISAID), and Nextstrain, an open-source project to harness the scientific and public health potential of pathogen genome data, the chronological emergence of viral variants of concern (VOCs) and variants of interest (VOIs), the major findings related to the rate of spread, and the mutations in the spike protein that are involved in the evasion of the host immune responses elicited by prior SARS-CoV-2 infections and by the protection induced by vaccination.
Collapse
|
18
|
Nabakooza G, Pastusiak A, Kateete DP, Lutwama JJ, Kitayimbwa JM, Frost SDW. Whole-genome analysis to determine the rate and patterns of intra-subtype reassortment among influenza type-A viruses in Africa. Virus Evol 2022; 8:veac005. [PMID: 35317349 PMCID: PMC8933723 DOI: 10.1093/ve/veac005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 01/13/2022] [Accepted: 01/28/2022] [Indexed: 12/05/2022] Open
Abstract
Influenza type-A viruses (IAVs) present a global burden of human respiratory infections and mortality. Genome reassortment is an important mechanism through which epidemiologically novel influenza viruses emerge and a core step in the safe reassortment-incompetent live-attenuated influenza vaccine development. Currently, there are no data on the rate, spatial and temporal distribution, and role of reassortment in the evolution and diversification of IAVs circulating in Africa. We aimed to detect intra-subtype reassortment among Africa pandemic H1N1pdm09 (2009-10), seasonal H1N1pdm09 (2011-20), and seasonal H3N2 viruses and characterize the genomic architecture and temporal and spatial distribution patterns of the resulting reassortants. Our study was nested within the Uganda National Influenza Surveillance Programme. Next-generation sequencing was used to generate whole genomes (WGs) from 234 H1N1pdm09 (n = 116) and H3N2 (n = 118) viruses sampled between 2010 and 2018 from seven districts in Uganda. We combined our newly generated WGs with 658 H1N1pdm09 and 1131 H3N2 WGs sampled between 1994 and 2020 across Africa and identified reassortants using an automated Graph Incompatibility Based Reassortment Finder software. Viral reassortment rates were estimated using a coalescent reassortant constant population model. Phylogenetic analysis was used to assess the effect of reassortment on viral genetic evolution. We observed a high frequency of intra-subtype reassortment events, 12 · 4 per cent (94/758) and 20 · 9 per cent (256/1,224), and reassortants, 13 · 3 per cent (101/758) and 38 · 6 per cent (472/1,224), among Africa H1N1pdm09 and H3N2 viruses, respectively. H1N1pdm09 reassorted at higher rates (0.1237-0.4255) than H3N2 viruses (0 · 00912-0.0355 events/lineage/year), a case unique to Uganda. Viral reassortants were sampled in 2009 through 2020, except in 2012. 78 · 2 per cent (79/101) of H1N1pdm09 reassortants acquired new non-structural, while 57 · 8 per cent (273/472) of the H3N2 reassortants had new hemagglutinin (H3) genes. Africa H3N2 viruses underwent more reassortment events involving larger reassortant sets than H1N1pdm09 viruses. Viruses with a specific reassortment architecture circulated for up to five consecutive years in specific countries and regions. The Eastern (Uganda and Kenya) and Western Africa harboured 84 · 2 per cent (85/101) and 55 · 9 per cent (264/472) of the continent's H1N1pdm09 and H3N2 reassortants, respectively. The frequent reassortment involving multi-genes observed among Africa IAVs showed the intracontinental viral evolution and diversification possibly sustained by viral importation from outside Africa and/or local viral genomic mixing and transmission. Novel reassortant viruses emerged every year, and some persisted in different countries and regions, thereby presenting a risk of influenza outbreaks in Africa. Our findings highlight Africa as part of the global influenza ecology and the advantage of implementing routine whole-over partial genome sequencing and analyses to monitor circulating and detect emerging viruses. Furthermore, this study provides evidence and heightens our knowledge on IAV evolution, which is integral in directing vaccine strain selection and the update of master donor viruses used in recombinant vaccine development.
Collapse
Affiliation(s)
- Grace Nabakooza
- Department of Immunology and Molecular Biology, Makerere University, Old Mulago Hill Road, P.O Box 7072, Kampala, Uganda
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe, Uganda
- Centre for Computational Biology, Uganda Christian University, Plot 67-173, Bishop Tucker Rd, P.O BOX 4, Mukono, Uganda
| | | | - David Patrick Kateete
- Department of Immunology and Molecular Biology, Makerere University, Old Mulago Hill Road, P.O Box 7072, Kampala, Uganda
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe, Uganda
| | - Julius Julian Lutwama
- Department of Arbovirology Emerging & Re-Emerging Infectious Diseases, Uganda Virus Research Institute (UVRI), Plot No: 51-59, Nakiwogo Road, P.O. Box 49, Entebbe, Uganda
| | - John Mulindwa Kitayimbwa
- UVRI Centre of Excellence in Infection and Immunity Research and Training (MUII-Plus), Makerere University, Plot No: 51-59 Nakiwogo Road, P.O. Box 49, Entebbe, Uganda
- Centre for Computational Biology, Uganda Christian University, Plot 67-173, Bishop Tucker Rd, P.O BOX 4, Mukono, Uganda
| | - Simon David William Frost
- Microsoft Research, 14820 NE 36th Street, Redmond, WA 98052, USA
- London School of Hygiene & Tropical Medicine (LSHTM), Keppel St, Bloomsbury, London WC1E 7HT, UK
| |
Collapse
|
19
|
Gorbalenya AE, Lauber C. Bioinformatics of virus taxonomy: foundations and tools for developing sequence-based hierarchical classification. Curr Opin Virol 2021; 52:48-56. [PMID: 34883443 DOI: 10.1016/j.coviro.2021.11.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/22/2021] [Accepted: 11/04/2021] [Indexed: 11/03/2022]
Abstract
The genome sequence is the only characteristic readily obtainable for all known viruses, underlying the growing role of comparative genomics in organizing knowledge about viruses in a systematic evolution-aware way, known as virus taxonomy. Overseen by the International Committee on Taxonomy of Viruses (ICTV), development of virus taxonomy involves taxa demarcation at 15 ranks of a hierarchical classification, often in host-specific manner. Outside the ICTV remit, researchers assess fitting numerous unclassified viruses into the established taxa. They employ different metrics of virus clustering, basing on conserved domain(s), separation of viruses in rooted phylogenetic trees and pair-wise distance space. Computational approaches differ further in respect to methodology, number of ranks considered, sensitivity to uneven virus sampling, and visualization of results. Advancing and using computational tools will be critical for improving taxa demarcation across the virosphere and resolving rank origins in research that may also inform experimental virology.
Collapse
Affiliation(s)
- Alexander E Gorbalenya
- Department of Medical Microbiology, Leiden University Medical Center, Leiden, The Netherlands; Faculty of Bioengineering and Bioinformatics and Belozersky, Institute of Physico-Chemical Biology, Lomonosov Moscow State University, 119899, Moscow, Russia.
| | - Chris Lauber
- Institute for Experimental Virology, TWINCORE Centre for Experimental and Clinical Infection Research, A Joint Venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
| |
Collapse
|
20
|
Rao RSP, Ahsan N, Xu C, Su L, Verburgt J, Fornelli L, Kihara D, Xu D. Evolutionary Dynamics of Indels in SARS-CoV-2 Spike Glycoprotein. Evol Bioinform Online 2021; 17:11769343211064616. [PMID: 34898980 PMCID: PMC8655444 DOI: 10.1177/11769343211064616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 11/12/2021] [Indexed: 01/28/2023] Open
Abstract
SARS-CoV-2, responsible for the current COVID-19 pandemic that claimed over 5.0 million lives, belongs to a class of enveloped viruses that undergo quick evolutionary adjustments under selection pressure. Numerous variants have emerged in SARS-CoV-2, posing a serious challenge to the global vaccination effort and COVID-19 management. The evolutionary dynamics of this virus are only beginning to be explored. In this work, we have analysed 1.79 million spike glycoprotein sequences of SARS-CoV-2 and found that the virus is fine-tuning the spike with numerous amino acid insertions and deletions (indels). Indels seem to have a selective advantage as the proportions of sequences with indels steadily increased over time, currently at over 89%, with similar trends across countries/variants. There were as many as 420 unique indel positions and 447 unique combinations of indels. Despite their high frequency, indels resulted in only minimal alteration of N-glycosylation sites, including both gain and loss. As indels and point mutations are positively correlated and sequences with indels have significantly more point mutations, they have implications in the evolutionary dynamics of the SARS-CoV-2 spike glycoprotein.
Collapse
Affiliation(s)
- R Shyama Prasad Rao
- Biostatistics and Bioinformatics Division, Yenepoya Research Center, Yenepoya University, Mangaluru, Karnataka, India
| | - Nagib Ahsan
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
- Mass Spectrometry, Proteomics and Metabolomics Core Facility, Stephenson Life Sciences Research Center, University of Oklahoma, Norman, OK, USA
| | - Chunhui Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Lingtao Su
- Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Luca Fornelli
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
- Department of Biology, University of Oklahoma, Norman, OK, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
21
|
Yang ZK, Pan L, Zhang Y, Luo H, Gao F. Data-driven identification of SARS-CoV-2 subpopulations using PhenoGraph and binary-coded genomic data. Brief Bioinform 2021; 22:bbab307. [PMID: 34382087 PMCID: PMC8385964 DOI: 10.1093/bib/bbab307] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 07/01/2021] [Accepted: 07/17/2021] [Indexed: 01/08/2023] Open
Abstract
For epidemic prevention and control, the identification of SARS-CoV-2 subpopulations sharing similar micro-epidemiological patterns and evolutionary histories is necessary for a more targeted investigation into the links among COVID-19 outbreaks caused by SARS-CoV-2 with similar genetic backgrounds. Genomic sequencing analysis has demonstrated the ability to uncover viral genetic diversity. However, an objective analysis is necessary for the identification of SARS-CoV-2 subpopulations. Herein, we detected all the mutations in 186 682 SARS-CoV-2 isolates. We found that the GC content of the SARS-CoV-2 genome had evolved to be lower, which may be conducive to viral spread, and the frameshift mutation was rare in the global population. Next, we encoded the genomic mutations in binary form and used an unsupervised learning classifier, namely PhenoGraph, to classify this information. Consequently, PhenoGraph successfully identified 303 SARS-CoV-2 subpopulations, and we found that the PhenoGraph classification was consistent with, but more detailed and precise than the known GISAID clades (S, L, V, G, GH, GR, GV and O). By the change trend analysis, we found that the growth rate of SARS-CoV-2 diversity has slowed down significantly. We also analyzed the temporal, spatial and phylogenetic relationships among the subpopulations and revealed the evolutionary trajectory of SARS-CoV-2 to a certain extent. Hence, our results provide a better understanding of the patterns and trends in the genomic evolution and epidemiology of SARS-CoV-2.
Collapse
Affiliation(s)
- Zhi-Kai Yang
- Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510700, China
| | - Lingyu Pan
- Guangzhou Nanxin Pharmaceutical Co., Ltd., Guangzhou 510700, China
| | - Yanming Zhang
- SinoGenoMax Co., Ltd./Chinese National Human Genome Center, Guangzhou 510700, China
| | - Hao Luo
- Department of Physics, School of Science, Tianjin University, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, and the Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| |
Collapse
|
22
|
Yuan F, Wang L, Fang Y, Wang L. Global SNP analysis of 11,183 SARS-CoV-2 strains reveals high genetic diversity. Transbound Emerg Dis 2021; 68:3288-3304. [PMID: 33207070 PMCID: PMC7753349 DOI: 10.1111/tbed.13931] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/19/2020] [Accepted: 11/13/2020] [Indexed: 02/05/2023]
Abstract
Since first identified in December of 2019, COVID-19 has been quickly spreading to the world in few months and COVID-19 cases are still undergoing rapid surge in most countries worldwide. The causative agent, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), adapts and evolves rapidly in nature. With the availability of 16,092 SARS-CoV-2 full genomes in GISAID as of 13 May, we removed the poor-quality genomes and performed mutational profiling analysis for the remaining 11,183 viral genomes. Global analysis of all sequences identified all single nucleotide polymorphisms (SNPs) across the whole genome and critical SNPs with high mutation frequency that contributes to five-clade classification of global strains. A total of 119 SNPs were found with 74 non-synonymous mutations, 43 synonymous mutations and two mutations in intergenic regions. Analysis of geographic pattern of mutational profiling for the whole genome reveals differences between each continent. A transition mutation from C to T represents the most mutation types across the genome, suggesting rapid evolution and adaptation of the virus in host. Amino acid (AA) deletions and insertions found across the genome results in changes in viral protein length and potential function alteration. Mutational profiling for each gene was analysed, and results show that nucleocapsid gene demonstrates the highest mutational frequency, followed by Nsp2, Nsp3 and Spike gene. We further focused on non-synonymous mutational distributions on four key viral proteins, spike with 75 mutations, RNA-dependent-RNA-polymerase with 41 mutations, 3C-like protease with 22 mutations and Papain-like protease with 10 mutations. Results show that non-synonymous mutations on critical sites of these four proteins pose great challenge for development of anti-viral drugs and other countering measures. Overall, this study provides more understanding of genetic diversity/variability of SARS-CoV-2 and insights for development of anti-viral therapeutics.
Collapse
Affiliation(s)
- Fangfeng Yuan
- Department of PathobiologyCollege of Veterinary MedicineUniversity of Illinois at Urbana ChampaignUrbanaIllinoisUSA
| | - Liping Wang
- Department of Diagnostic Medicine and PathobiologyCollege of Veterinary MedicineKansas State UniversityManhattanKansasUSA
| | - Ying Fang
- Department of PathobiologyCollege of Veterinary MedicineUniversity of Illinois at Urbana ChampaignUrbanaIllinoisUSA
| | - Leyi Wang
- Veterinary Diagnostic Laboratory and Department of Veterinary Clinical MedicineCollege of Veterinary MedicineUniversity of IllinoisUrbanaIllinoisUSA
| |
Collapse
|
23
|
Genomic variation and point mutations analysis of Indian COVID-19 patient samples submitted in GISAID database. J INDIAN CHEM SOC 2021. [PMCID: PMC8442303 DOI: 10.1016/j.jics.2021.100156] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Corona virus disease 2019 (COVID-19) endemic has havoc on the world; the causative virus of the pandemic is SARS CoV-2. Pharmaceutical companies and academic institutes are in continuous efforts to identify anti-viral therapy or vaccines, but the most significant challenge faced is the highly evolving genome of SARS CoV-2, which is imparting evolutionary selective benefits to the virus. To understand the viral mutations, we have retrieved nine hundred and thirty-four samples from different states of India via the GISAID database and analyzed the frequency of all types of point mutation in all structural, non-structural proteins, and accessory factors of SARS CoV-2. Spike glycol protein, nsp3, nsp6, nsp12, N and NS3 were the most evolving proteins. High frequency point mutations were Q496P (nsp2), A380V (nsp4), A994D (nsp3), L37F (nsp6), P323L & A97V (nsp12), Q57H (ns3), D614G (S), P13L (N), R203K (N), G204R (N) and S194L (N).
Collapse
|
24
|
Ahammad I, Hossain MU, Rahman A, Chowdhury ZM, Bhattacharjee A, Das KC, Keya CA, Salimullah M. Wave-wise comparative genomic study for revealing the complete scenario and dynamic nature of COVID-19 pandemic in Bangladesh. PLoS One 2021; 16:e0258019. [PMID: 34587212 PMCID: PMC8480844 DOI: 10.1371/journal.pone.0258019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/16/2021] [Indexed: 12/19/2022] Open
Abstract
As the COVID-19 pandemic continues to ravage across the globe and take millions of lives and like many parts of the world, the second wave of the pandemic hit Bangladesh, this study aimed at understanding its causative agent, SARS-CoV-2 at the genomic and proteomic level and provide precious insights about the pathogenesis, evolution, strengths and weaknesses of the virus. As of Mid-June 2021, over 1500 SARS-CoV-2 genomesequences have been deposited in the GISAID database from Bangladesh which were extracted and categorized into two waves. By analyzing these genome sequences, it was discovered that the wave-2 samples had a significantly greater average rate of mutation/sample (30.79%) than the wave-1 samples (12.32%). Wave-2 samples also had a higher frequency of deletion, and transversion events. During the first wave, the GR clade was the most predominant but it was replaced by the GH clade in the latter wave. The B.1.1.25 variant showed the highest frequency in wave-1 while in case of wave-2, the B.1.351.3 variant, was the most common one. A notable presence of the delta variant, which is currently at the center of concern, was also observed. Comparison of the Spike protein found in the reference and the 3 most common lineages found in Bangladesh namely, B.1.1.7, B.1.351, B.1.617 in terms of their ability to form stable complexes with ACE2 receptor revealed that B.1.617 had the potential to be more transmissible than others. Importantly, no indigenous variants have been detected so far which implies that the successful prevention of import of foreign variants can diminish the outbreak in the country.
Collapse
Affiliation(s)
- Ishtiaque Ahammad
- Bioinformatics Division, National Institute of Biotechnology, Dhaka, Bangladesh
| | | | - Anisur Rahman
- Bioinformatics Division, National Institute of Biotechnology, Dhaka, Bangladesh
| | | | | | - Keshob Chandra Das
- Molecular Biotechnology Division, National Institute of Biotechnology, Dhaka, Bangladesh
| | - Chaman Ara Keya
- Department of Biochemistry and Microbiology, North South University, Bashundhara, Dhaka, Bangladesh
| | - Md. Salimullah
- Molecular Biotechnology Division, National Institute of Biotechnology, Dhaka, Bangladesh
| |
Collapse
|
25
|
Owuor DC, de Laurent ZR, Kikwai GK, Mayieka LM, Ochieng M, Müller NF, Otieno NA, Emukule GO, Hunsperger EA, Garten R, Barnes JR, Chaves SS, Nokes DJ, Agoti CN. Characterizing the Countrywide Epidemic Spread of Influenza A(H1N1)pdm09 Virus in Kenya between 2009 and 2018. Viruses 2021; 13:1956. [PMID: 34696386 PMCID: PMC8539974 DOI: 10.3390/v13101956] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 09/13/2021] [Accepted: 09/22/2021] [Indexed: 12/01/2022] Open
Abstract
The spatiotemporal patterns of spread of influenza A(H1N1)pdm09 viruses on a countrywide scale are unclear in many tropical/subtropical regions mainly because spatiotemporally representative sequence data are lacking. We isolated, sequenced, and analyzed 383 A(H1N1)pdm09 viral genomes from hospitalized patients between 2009 and 2018 from seven locations across Kenya. Using these genomes and contemporaneously sampled global sequences, we characterized the spread of the virus in Kenya over several seasons using phylodynamic methods. The transmission dynamics of A(H1N1)pdm09 virus in Kenya were characterized by (i) multiple virus introductions into Kenya over the study period, although only a few of those introductions instigated local seasonal epidemics that then established local transmission clusters, (ii) persistence of transmission clusters over several epidemic seasons across the country, (iii) seasonal fluctuations in effective reproduction number (Re) associated with lower number of infections and seasonal fluctuations in relative genetic diversity after an initial rapid increase during the early pandemic phase, which broadly corresponded to epidemic peaks in the northern and southern hemispheres, (iv) high virus genetic diversity with greater frequency of seasonal fluctuations in 2009-2011 and 2018 and low virus genetic diversity with relatively weaker seasonal fluctuations in 2012-2017, and (v) virus spread across Kenya. Considerable influenza virus diversity circulated within Kenya, including persistent viral lineages that were unique to the country, which may have been capable of dissemination to other continents through a globally migrating virus population. Further knowledge of the viral lineages that circulate within understudied low-to-middle-income tropical and subtropical regions is required to understand the full diversity and global ecology of influenza viruses in humans and to inform vaccination strategies within these regions.
Collapse
Affiliation(s)
- D. Collins Owuor
- Wellcome Trust Research Programme, Epidemiology and Demography Department, Kenya Medical Research Institute (KEMRI), Kilifi 230-80108, Kenya; (Z.R.d.L.); (D.J.N.); (C.N.A.)
| | - Zaydah R. de Laurent
- Wellcome Trust Research Programme, Epidemiology and Demography Department, Kenya Medical Research Institute (KEMRI), Kilifi 230-80108, Kenya; (Z.R.d.L.); (D.J.N.); (C.N.A.)
| | - Gilbert K. Kikwai
- Kenya Medical Research Institute (KEMRI), Nairobi 54840-00200, Kenya; (G.K.K.); (L.M.M.); (M.O.); (N.A.O.)
| | - Lillian M. Mayieka
- Kenya Medical Research Institute (KEMRI), Nairobi 54840-00200, Kenya; (G.K.K.); (L.M.M.); (M.O.); (N.A.O.)
| | - Melvin Ochieng
- Kenya Medical Research Institute (KEMRI), Nairobi 54840-00200, Kenya; (G.K.K.); (L.M.M.); (M.O.); (N.A.O.)
| | - Nicola F. Müller
- Fred Hutchinson Cancer Research Center, Vaccine and Infectious Disease Division, Seattle, WA 98109, USA;
| | - Nancy A. Otieno
- Kenya Medical Research Institute (KEMRI), Nairobi 54840-00200, Kenya; (G.K.K.); (L.M.M.); (M.O.); (N.A.O.)
| | - Gideon O. Emukule
- Centers for Disease Control and Prevention (CDC), Influenza Division, Nairobi 606-00621, Kenya; (G.O.E.); (S.S.C.)
| | - Elizabeth A. Hunsperger
- Centers for Disease Control and Prevention, Division of Global Health Protection, Nairobi 606-00621, Kenya;
- Centers for Disease Control and Prevention, Division of Global Health Protection, Atlanta, GA 30333, USA
| | - Rebecca Garten
- Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention, Atlanta, GA 30333, USA; (R.G.); (J.R.B.)
| | - John R. Barnes
- Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention, Atlanta, GA 30333, USA; (R.G.); (J.R.B.)
| | - Sandra S. Chaves
- Centers for Disease Control and Prevention (CDC), Influenza Division, Nairobi 606-00621, Kenya; (G.O.E.); (S.S.C.)
- Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention, Atlanta, GA 30333, USA; (R.G.); (J.R.B.)
| | - D. James Nokes
- Wellcome Trust Research Programme, Epidemiology and Demography Department, Kenya Medical Research Institute (KEMRI), Kilifi 230-80108, Kenya; (Z.R.d.L.); (D.J.N.); (C.N.A.)
- School of Life Sciences and Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), Coventry CV4 7AL, UK
| | - Charles N. Agoti
- Wellcome Trust Research Programme, Epidemiology and Demography Department, Kenya Medical Research Institute (KEMRI), Kilifi 230-80108, Kenya; (Z.R.d.L.); (D.J.N.); (C.N.A.)
- School of Public Health and Human Sciences, Pwani University, Kilifi 195-80108, Kenya
| |
Collapse
|
26
|
Time-Series Trend of Pandemic SARS-CoV-2 Variants Visualized Using Batch-Learning Self-Organizing Map for Oligonucleotide Compositions. DATA SCIENCE JOURNAL 2021. [DOI: 10.5334/dsj-2021-029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
27
|
Alai S, Gujar N, Joshi M, Gautam M, Gairola S. Pan-India novel coronavirus SARS-CoV-2 genomics and global diversity analysis in spike protein. Heliyon 2021; 7:e06564. [PMID: 33758785 PMCID: PMC7972664 DOI: 10.1016/j.heliyon.2021.e06564] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/30/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
Abstract
The mortality rates due to COVID-19 have been found disproportionate globally and are currently being researched. India mortality rate with a population of 1.3 billion people is relatively lowest to other countries with high infection rates. Genetic composition of circulating isolates continues to be a key determinant of virulence and pathogenesis. This study aimed to analyse the extent of divergence between genomes of Indian isolates (n = 2525 as compared to reference Wuhan-1 strain and isolates from countries showing higher fatality rates including France, Italy, Belgium, and the USA. The study also analyses the impact of key mutations on interactions with angiotensin converting enzyme 2 (ACE2) and panel of neutralizing monoclonal antibodies. Using 1,44,605 spike protein sequences, global prevalence of mutations in spike protein was observed. The study suggests that SARS-CoV-2 genomes from India share consensus with global trends with respect to D614G as most prevalent mutational event (81.66% among 2525 Indian isolates). Indian isolates did not reported prevalence of N439K mutation in receptor binding motif (RBM) as compared to global isolates (0.54%). Computational docking and molecular dynamics simulation analysis of N439K mutation with respect to ACE 2 binding and reactivity with RBM targeted antibodies viz., B38, BD23, CB6, P2B-F26 and EY6A suggests that variant have relatively higher affinity with ACE 2 receptor which may support higher infectivity. The study warrants large scale monitoring of Indian isolates as SARS-CoV-2 virus is expected to evolve and mutations may appear in unpredictable way.
Collapse
Affiliation(s)
- Shweta Alai
- Department of Health and Biological Sciences, Symbiosis International University, Pune, Maharashtra, 412115, India
| | - Nidhi Gujar
- Bioinformatics Centre, Savitribai Phule Pune University, Pune, Maharashtra, 411007, India
| | - Manali Joshi
- Bioinformatics Centre, Savitribai Phule Pune University, Pune, Maharashtra, 411007, India
| | - Manish Gautam
- Serum Institute of India Pvt Ltd, Pune, Maharashtra, 411028, India
| | - Sunil Gairola
- Serum Institute of India Pvt Ltd, Pune, Maharashtra, 411028, India
| |
Collapse
|
28
|
Teo AKJ, Choudhury Y, Tan IB, Cher CY, Chew SH, Wan ZY, Cheng LTE, Oon LLE, Tan MH, Chan KS, Hsu LY. Saliva is more sensitive than nasopharyngeal or nasal swabs for diagnosis of asymptomatic and mild COVID-19 infection. Sci Rep 2021; 11:3134. [PMID: 33542443 PMCID: PMC7862309 DOI: 10.1038/s41598-021-82787-z] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 12/22/2020] [Indexed: 01/10/2023] Open
Abstract
We aimed to test the sensitivity of naso-oropharyngeal saliva and self-administered nasal (SN) swab compared to nasopharyngeal (NP) swab for COVID-19 testing in a large cohort of migrant workers in Singapore. We also tested the utility of next-generation sequencing (NGS) for diagnosis of COVID-19. Saliva, NP and SN swabs were collected from subjects who presented with acute respiratory infection, their asymptomatic roommates, and prior confirmed cases who were undergoing isolation at a community care facility in June 2020. All samples were tested using RT-PCR. SARS-CoV-2 amplicon-based NGS with phylogenetic analysis was done for 30 samples. We recruited 200 subjects, of which 91 and 46 were tested twice and thrice respectively. In total, 62.0%, 44.5%, and 37.7% of saliva, NP and SN samples were positive. Cycle threshold (Ct) values were lower during the earlier period of infection across all sample types. The percentage of test-positive saliva was higher than NP and SN swabs. We found a strong correlation between viral genome coverage by NGS and Ct values for SARS-CoV-2. Phylogenetic analyses revealed Clade O and lineage B.6 known to be circulating in Singapore. We found saliva to be a sensitive and viable sample for COVID-19 diagnosis.
Collapse
Affiliation(s)
- Alvin Kuo Jing Teo
- Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, #10-01, 12 Science Drive 2, Singapore, 117549, Singapore
| | | | - Iain Beehuat Tan
- Department of Medical Oncology, National Cancer Centre, Singapore, Singapore.,Duke-NUS Graduate Medical School, National University of Singapore, Singapore, Singapore.,Genome Institute of Singapore, Singapore, Singapore
| | | | - Shi Hao Chew
- Headquarters Army Medical Services, Singapore Armed Forces, Singapore, Singapore
| | - Zi Yi Wan
- Lucence Diagnostics, Singapore, Singapore
| | - Lionel Tim Ee Cheng
- Department of Diagnostic Radiology, Singapore General Hospital, Singapore, Singapore
| | - Lynette Lin Ean Oon
- Department of Molecular Pathology, Singapore General Hospital, Singapore, Singapore
| | | | - Kian Sing Chan
- Department of Molecular Pathology, Singapore General Hospital, Singapore, Singapore
| | - Li Yang Hsu
- Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, #10-01, 12 Science Drive 2, Singapore, 117549, Singapore. .,Yong Loo Lin School of Medicine, National University of Singapore, National University Health System, Singapore, Singapore.
| |
Collapse
|
29
|
Arena F, Pollini S, Rossolini GM, Margaglione M. Summary of the Available Molecular Methods for Detection of SARS-CoV-2 during the Ongoing Pandemic. Int J Mol Sci 2021; 22:ijms22031298. [PMID: 33525651 PMCID: PMC7865767 DOI: 10.3390/ijms22031298] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 01/24/2021] [Accepted: 01/25/2021] [Indexed: 12/25/2022] Open
Abstract
Since early 2020, the COVID-19 pandemic has caused an excess in morbidity and mortality rates worldwide. Containment strategies rely firstly on rapid and sensitive laboratory diagnosis, with molecular detection of the viral genome in respiratory samples being the gold standard. The reliability of diagnostic protocols could be affected by SARS-CoV-2 genetic variability. In fact, mutations occurring during SARS-CoV-2 genomic evolution can involve the regions targeted by the diagnostic probes. Following a review of the literature and an in silico analysis of the most recently described virus variants (including the UK B 1.1.7 and the South Africa 501Y.V2 variants), we conclude that the described genetic variability should have minimal or no effect on the sensitivity of existing diagnostic protocols for SARS-CoV-2 genome detection. However, given the continuous emergence of new variants, the situation should be monitored in the future, and protocols including multiple targets should be preferred.
Collapse
Affiliation(s)
- Fabio Arena
- Department of Clinical and Experimental Medicine, University of Foggia, 71122 Foggia, Italy;
- IRCCS Don Carlo Gnocchi Foundation, 50143 Florence, Italy
- Correspondence: ; Tel.: +39-0881-588064
| | - Simona Pollini
- Department of Experimental and Clinical Medicine, University of Florence, 50134 Florence, Italy; (S.P.); (G.M.R.)
- Clinical Microbiology and Virology Unit, Florence Careggi University Hospital, 50134 Florence, Italy
| | - Gian Maria Rossolini
- Department of Experimental and Clinical Medicine, University of Florence, 50134 Florence, Italy; (S.P.); (G.M.R.)
- Clinical Microbiology and Virology Unit, Florence Careggi University Hospital, 50134 Florence, Italy
| | - Maurizio Margaglione
- Department of Clinical and Experimental Medicine, University of Foggia, 71122 Foggia, Italy;
| |
Collapse
|
30
|
Chen J, Hilt EE, Li F, Wu H, Jiang Z, Zhang Q, Wang J, Wang Y, Li Z, Tang J, Yang S. Epidemiological and Genomic Analysis of SARS-CoV-2 in 10 Patients From a Mid-Sized City Outside of Hubei, China in the Early Phase of the COVID-19 Outbreak. Front Public Health 2020; 8:567621. [PMID: 33072702 PMCID: PMC7531217 DOI: 10.3389/fpubh.2020.567621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 08/14/2020] [Indexed: 12/28/2022] Open
Abstract
A novel coronavirus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the ongoing Coronavirus Disease 2019 (COVID-19) pandemic. In this study, we performed a comprehensive epidemiological and genomic analysis of SARS-CoV-2 genomes from 10 patients in Shaoxing (Zhejiang Province), a mid-sized city outside of the epicenter Hubei province, China, during the early stage of the outbreak (late January to early February, 2020). We obtained viral genomes with >99% coverage and a mean depth of 296X demonstrating that viral genomic analysis is feasible via metagenomics sequencing directly on nasopharyngeal samples with SARS-CoV-2 Real-time PCR Ct values <28. We found that a cluster of four patients with travel history to Hubei shared the exact same virus with patients from Wuhan, Taiwan, Belgium, and Australia, highlighting how quickly this virus spread to the globe. The virus from another cluster of two family members living together without travel history but with a sick contact of a confirmed case from another city outside of Hubei accumulated significantly more mutations (9 SNPs vs. average 4 SNPs), suggesting a complex and dynamic nature of this outbreak. Our findings add to the growing knowledge of the epidemiological and genomic characteristics of SARS-CoV-2 and offers a glimpse into the early phase of this viral infection outside of Hubei, China.
Collapse
Affiliation(s)
- Jinkun Chen
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Evann E Hilt
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Fan Li
- Three Coin Analytics, Inc., Pleasanton, CA, United States
| | - Huan Wu
- IngeniGen XunMinKang Biotechnology Inc., Shaoxing, China
| | - Zhuojing Jiang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Qinchao Zhang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Jiling Wang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Yifang Wang
- IngeniGen XunMinKang Biotechnology Inc., Shaoxing, China
| | - Ziqin Li
- Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou, China
| | - Jialiang Tang
- Shaoxing Center for Disease Control and Prevention, Shaoxing, China
| | - Shangxin Yang
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, CA, United States.,Zhejiang-California International Nanosystems Institute, Zhejiang University, Hangzhou, China
| |
Collapse
|
31
|
Pawestri HA, Nugraha AA, Han AX, Pratiwi E, Parker E, Richard M, van der Vliet S, Fouchier RAM, Muljono DH, de Jong MD, Setiawaty V, Eggink D. Genetic and antigenic characterization of influenza A/H5N1 viruses isolated from patients in Indonesia, 2008-2015. Virus Genes 2020; 56:417-429. [PMID: 32483655 PMCID: PMC7262163 DOI: 10.1007/s11262-020-01765-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 05/07/2020] [Indexed: 01/07/2023]
Abstract
Since the initial detection in 2003, Indonesia has reported 200 human cases of highly pathogenic avian influenza H5N1 (HPAI H5N1), associated with an exceptionally high case fatality rate (84%) compared to other geographical regions affected by other genetic clades of the virus. However, there is limited information on the genetic diversity of HPAI H5N1 viruses, especially those isolated from humans in Indonesia. In this study, the genetic and antigenic characteristics of 35 HPAI H5N1 viruses isolated from humans were analyzed. Full genome sequences were analyzed for the presence of substitutions in the receptor binding site, and polymerase complex, as markers for virulence or human adaptation, as well as antiviral drug resistance substitutions. Only a few substitutions associated with human adaptation were observed, a remarkably low prevalence of the human adaptive substitution PB2-E627K, which is common during human infection with other H5N1 clades and a known virulence marker for avian influenza viruses during human infections. In addition, the antigenic profile of these Indonesian HPAI H5N1 viruses was determined using serological analysis and antigenic cartography. Antigenic characterization showed two distinct antigenic clusters, as observed previously for avian isolates. These two antigenic clusters were not clearly associated with time of virus isolation. This study provides better insight in genetic diversity of H5N1 viruses during human infection and the presence of human adaptive markers. These findings highlight the importance of evaluating virus genetics for HPAI H5N1 viruses to estimate the risk to human health and the need for increased efforts to monitor the evolution of H5N1 viruses across Indonesia.
Collapse
Affiliation(s)
- Hana A Pawestri
- National Institute of Health Research and Development, Ministry of Health, Jakarta, Indonesia
| | - Arie A Nugraha
- National Institute of Health Research and Development, Ministry of Health, Jakarta, Indonesia
| | - Alvin X Han
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Eka Pratiwi
- National Institute of Health Research and Development, Ministry of Health, Jakarta, Indonesia
| | - Edyth Parker
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.,Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Mathilde Richard
- Department of Viroscience, Erasmus MC, Rotterdam, The Netherlands
| | | | - Ron A M Fouchier
- Department of Viroscience, Erasmus MC, Rotterdam, The Netherlands
| | | | - Menno D de Jong
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Vivi Setiawaty
- National Institute of Health Research and Development, Ministry of Health, Jakarta, Indonesia.
| | - Dirk Eggink
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
32
|
Han AX, Parker E, Maurer-Stroh S, Russell CA. Inferring putative transmission clusters with Phydelity. Virus Evol 2019; 5:vez039. [PMID: 31616568 PMCID: PMC6785678 DOI: 10.1093/ve/vez039] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Current phylogenetic clustering approaches for identifying pathogen transmission clusters are limited by their dependency on arbitrarily defined genetic distance thresholds for within-cluster divergence. Incomplete knowledge of a pathogen’s underlying dynamics often reduces the choice of distance threshold to an exploratory, ad hoc exercise that is difficult to standardise across studies. Phydelity is a new tool for the identification of transmission clusters in pathogen phylogenies. It identifies groups of sequences that are more closely related than the ensemble distribution of the phylogeny under a statistically principled and phylogeny-informed framework, without the introduction of arbitrary distance thresholds. Relative to other distance threshold- and model-based methods, Phydelity outputs clusters with higher purity and lower probability of misclassification in simulated phylogenies. Applying Phydelity to empirical datasets of hepatitis B and C virus infections showed that Phydelity identified clusters with better correspondence to individuals that are more likely to be linked by transmission events relative to other widely used non-parametric phylogenetic clustering methods without the need for parameter calibration. Phydelity is generalisable to any pathogen and can be used to identify putative direct transmission events. Phydelity is freely available at https://github.com/alvinxhan/Phydelity.
Collapse
Affiliation(s)
- Alvin X Han
- Protein Sequence Analysis Group, Bioinformatics Institute, Agency for Science, Technology and Research (ASTAR), 30 Biopolis Street, 138671 Singapore.,NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore (NUS), 21 Lower Kent Ridge, 119077 Singapore.,Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, Meibergdreef 9, 1105 AZ Amsterdam-Zuidoost, The Netherlands
| | - Edyth Parker
- Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, Meibergdreef 9, 1105 AZ Amsterdam-Zuidoost, The Netherlands.,Department of Veterinary Medicine, University of Cambridge, Madingley Rd, Cambridge CB3 0ES, UK
| | - Sebastian Maurer-Stroh
- Protein Sequence Analysis Group, Bioinformatics Institute, Agency for Science, Technology and Research (ASTAR), 30 Biopolis Street, 138671 Singapore.,Department of Biological Sciences, National University of Singapore, 16 Science Drive 4, 117558 Singapore
| | - Colin A Russell
- Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, Meibergdreef 9, 1105 AZ Amsterdam-Zuidoost, The Netherlands
| |
Collapse
|