1
|
Jaya FR, Brito BP, Darling AE. Evaluation of recombination detection methods for viral sequencing. Virus Evol 2023; 9:vead066. [PMID: 38131005 PMCID: PMC10734630 DOI: 10.1093/ve/vead066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/03/2023] [Accepted: 11/15/2023] [Indexed: 12/23/2023] Open
Abstract
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods (RDMs) have been developed over the past two decades; however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed eight RDMs: PhiPack (Profile), 3SEQ, GENECONV, recombination detection program (RDP) (OpenRDP), MaxChi (OpenRDP), Chimaera (OpenRDP), UCHIME (VSEARCH), and gmos; to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Furthermore, we provide a practical example for the analysis and validation of empirical data. We find that RDMs need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
Collapse
Affiliation(s)
- Frederick R Jaya
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- Ecology and Evolution, Research School of Biology, Australian National University, 134 Linnaeus Way, Acton, Australian Capital Territory 2600, Australia
| | - Barbara P Brito
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- New South Wales Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, Menangle, New South Wales 2568, Australia
| | - Aaron E Darling
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- Illumina Australia Pty Ltd, Ultimo, New South Wales 2007, Australia
| |
Collapse
|
2
|
Jones BR, Joy JB. Inferring Human Immunodeficiency Virus 1 Proviral Integration Dates With Bayesian Inference. Mol Biol Evol 2023; 40:msad156. [PMID: 37421655 PMCID: PMC10411489 DOI: 10.1093/molbev/msad156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 06/23/2023] [Accepted: 07/06/2023] [Indexed: 07/10/2023] Open
Abstract
Human immunodeficiency virus 1 (HIV) proviruses archived in the persistent reservoir currently pose the greatest obstacle to HIV cure due to their evasion of combined antiretroviral therapy and ability to reseed HIV infection. Understanding the dynamics of the HIV persistent reservoir is imperative for discovering a durable HIV cure. Here, we explore Bayesian methods using the software BEAST2 to estimate HIV proviral integration dates. We started with within-host longitudinal HIV sequences collected prior to therapy, along with sequences collected from the persistent reservoir during suppressive therapy. We built a BEAST2 model to estimate integration dates of proviral sequences collected during suppressive therapy, implementing a tip date random walker to adjust the sequence tip dates and a latency-specific prior to inform the dates. To validate our method, we implemented it on both simulated and empirical data sets. Consistent with previous studies, we found that proviral integration dates were spread throughout active infection. Path sampling to select an alternative prior for date estimation in place of the latency-specific prior produced unrealistic results in one empirical data set, whereas on another data set, the latency-specific prior was selected as best fitting. Our Bayesian method outperforms current date estimation techniques with a root mean squared error of 0.89 years on simulated data relative to 1.23-1.89 years with previously developed methods. Bayesian methods offer an adaptable framework for inferring proviral integration dates.
Collapse
Affiliation(s)
- Bradley R Jones
- Molecular Epidemiology and Evolutionary Genetics, B.C. Centre for Excellence in HIV/AIDS, Vancouver, Canada
- Bioinformatics Program, University of British Columbia, Vancouver, Canada
| | - Jeffrey B Joy
- Molecular Epidemiology and Evolutionary Genetics, B.C. Centre for Excellence in HIV/AIDS, Vancouver, Canada
- Bioinformatics Program, University of British Columbia, Vancouver, Canada
- Deparment of Medicine, University of British Columbia, Vancouver, Canada
| |
Collapse
|
3
|
Buckley PR, Lee CH, Antanaviciute A, Simmons A, Koohy H. A systems approach evaluating the impact of SARS-CoV-2 variant of concern mutations on CD8+ T cell responses. IMMUNOTHERAPY ADVANCES 2023; 3:ltad005. [PMID: 37082106 PMCID: PMC10112682 DOI: 10.1093/immadv/ltad005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 03/02/2023] [Indexed: 03/17/2023] Open
Abstract
T cell recognition of SARS-CoV-2 antigens after vaccination and/or natural infection has played a central role in resolving SARS-CoV-2 infections and generating adaptive immune memory. However, the clinical impact of SARS-CoV-2-specific T cell responses is variable and the mechanisms underlying T cell interaction with target antigens are not fully understood. This is especially true given the virus' rapid evolution, which leads to new variants with immune escape capacity. In this study, we used the Omicron variant as a model organism and took a systems approach to evaluate the impact of mutations on CD8+ T cell immunogenicity. We computed an immunogenicity potential score for each SARS-CoV-2 peptide antigen from the ancestral strain and Omicron, capturing both antigen presentation and T cell recognition probabilities. By comparing ancestral vs. Omicron immunogenicity scores, we reveal a divergent and heterogeneous landscape of impact for CD8+ T cell recognition of mutated targets in Omicron variants. While T cell recognition of Omicron peptides is broadly preserved, we observed mutated peptides with deteriorated immunogenicity that may assist breakthrough infection in some individuals. We then combined our scoring scheme with an in silico mutagenesis, to characterise the position- and residue-specific theoretical mutational impact on immunogenicity. While we predict many escape trajectories from the theoretical landscape of substitutions, our study suggests that Omicron mutations in T cell epitopes did not develop under cell-mediated pressure. Our study provides a generalisable platform for fostering a deeper understanding of existing and novel variant impact on antigen-specific vaccine- and/or infection-induced T cell immunity.
Collapse
Affiliation(s)
- Paul R Buckley
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Chloe H Lee
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Agne Antanaviciute
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Alison Simmons
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - Hashem Koohy
- Medical Research Council (MRC) Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, UK
- MRC WIMM Centre for Computational Biology, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, UK
- Alan Turing Fellow in Health and Medicine
| |
Collapse
|
4
|
Barzilai LP, Schrago CG. Signatures of natural selection in tree topology shape of serially sampled viral phylogenies. Mol Phylogenet Evol 2023; 183:107776. [PMID: 36990305 DOI: 10.1016/j.ympev.2023.107776] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/24/2023] [Accepted: 03/24/2023] [Indexed: 03/29/2023]
Abstract
Tree shape metrics can be computed fast for trees of any size, which makes them promising alternatives to intensive statistical methods and parameter-rich evolutionary models in the era of massive data availability. Previous studies have demonstrated their effectiveness in unveiling important parameters in viral evolutionary dynamics, although the impact of natural selection on the shape of tree topologies has not been thoroughly investigated. We carried out a forward-time and individual-based simulation to investigate whether tree shape metrics of several kinds could predict the selection regime employed to generate the data. To examine the impact of the genetic diversity of the founder viral population, simulations were run under two opposing starting configurations of the genetic diversity of the infecting viral population. We found that four evolutionary regimes, namely, negative, positive, and frequency-dependent selection, as well as neutral evolution, were successfully distinguished by tree topology shape metrics. Two metrics from the Laplacian spectral density profile (principal eigenvalue and peakedness) and the number of cherries were the most informative for indicating selection type. The genetic diversity of the founder population had an impact on differentiating evolutionary scenarios. Tree imbalance, which has been frequently associated with the action of natural selection on intrahost viral diversity, was also characteristic of neutrally evolving serially sampled data. Metrics calculated from empirical analysis of HIV datasets indicated that most tree topologies exhibited shapes closer to the frequency-dependent selection or neutral evolution regimes.
Collapse
|
5
|
Cárdenas P, Corredor V, Santos-Vega M. Genomic epidemiological models describe pathogen evolution across fitness valleys. SCIENCE ADVANCES 2022; 8:eabo0173. [PMID: 35857510 PMCID: PMC9278859 DOI: 10.1126/sciadv.abo0173] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
Genomics is fundamentally changing epidemiological research. However, systematically exploring hypotheses in pathogen evolution requires new modeling tools. Models intertwining pathogen epidemiology and genomic evolution can help understand processes such as the emergence of novel pathogen genotypes with higher transmissibility or resistance to treatment. In this work, we present Opqua, a flexible simulation framework that explicitly links epidemiology to sequence evolution and selection. We use Opqua to study determinants of evolution across fitness valleys. We confirm that competition can limit evolution in high-transmission environments and find that low transmission, host mobility, and complex pathogen life cycles facilitate reaching new adaptive peaks through population bottlenecks and decoupling of selective pressures. The results show the potential of genomic epidemiological modeling as a tool in infectious disease research.
Collapse
Affiliation(s)
- Pablo Cárdenas
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Vladimir Corredor
- Departamento de Salud Pública, Facultad de Medicina, Universidad Nacional de Colombia, Bogotá, D.C., Colombia
| | - Mauricio Santos-Vega
- Grupo Biología Matemática y Computacional, Departamento Ingeniería Biomédica, Universidad de los Andes, Bogotá, D.C., Colombia
| |
Collapse
|
6
|
Gaunt MW, Pettersson JHO, Kuno G, Gaunt B, de Lamballerie X, Gould EA. Widespread Interspecific Phylogenetic Tree Incongruence Between Mosquito-Borne and Insect-Specific Flaviviruses at Hotspots Originally Identified in Zika Virus. Virus Evol 2022; 8:veac027. [PMID: 35591877 PMCID: PMC9113262 DOI: 10.1093/ve/veac027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 10/22/2021] [Accepted: 04/17/2022] [Indexed: 11/16/2022] Open
Abstract
Intraspecies (homologous) phylogenetic incongruence, or ‘tree conflict’ between different loci within the same genome of mosquito-borne flaviviruses (MBFV), was first identified in dengue virus (DENV) and subsequently in Japanese encephalitis virus (JEV), St Louis encephalitis virus, and Zika virus (ZIKV). Recently, the first evidence of phylogenetic incongruence between interspecific members of the MBFV was reported in ZIKV and its close relative, Spondweni virus. Uniquely, these hybrid proteomes were derived from four incongruent trees involving an Aedes-associated DENV node (1 tree) and three different Culex-associated flavivirus nodes (3 trees). This analysis has now been extended across a wider spectrum of viruses within the MBFV lineage targeting the breakpoints between phylogenetic incongruent loci originally identified in ZIKV. Interspecies phylogenetic incongruence at these breakpoints was identified in 10 of 50 viruses within the MBFV lineage, representing emergent Aedes and Culex-associated viruses including JEV, West Nile virus, yellow fever virus, and insect-specific viruses. Thus, interspecies phylogenetic incongruence is widespread amongst the flaviviruses and is robustly associated with the specific breakpoints that coincide with the interspecific phylogenetic incongruence previously identified, inferring they are ‘hotspots’. The incongruence amongst the emergent MBFV group was restricted to viruses within their respective associated epidemiological boundaries. This MBFV group was RY-coded at the third codon position (‘wobble codon’) to remove transition saturation. The resulting ‘wobble codon’ trees presented a single topology for the entire genome that lacked any robust evidence of phylogenetic incongruence between loci. Phylogenetic interspecific incongruence was therefore observed for exactly the same loci between amino acid and the RY-coded ‘wobble codon’ alignments and this incongruence represented either a major part, or the entire genomes. Maximum likelihood codon analysis revealed positive selection for the incongruent lineages. Positive selection could result in the same locus producing two opposing trees. These analyses for the clinically important MBFV suggest that robust interspecific phylogenetic incongruence resulted from amino acid selection. Convergent or parallel evolutions are evolutionary processes that would explain the observation, whilst interspecific recombination is unlikely.
Collapse
Affiliation(s)
- Michael W Gaunt
- London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, United Kingdom
| | - John H-O Pettersson
- Department of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala University, Uppsala, Sweden
- Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and School of Medical Sciences, the University of Sydney, Sydney, New South Wales 2006, Australia
| | - Goro Kuno
- Formerly, Centers for Disease Control, Fort Collins, CO 80521, USA
| | - Bill Gaunt
- Aeon-sys, MBCS Kensington Road, Barnsley S75 2TU, UK
| | - Xavier de Lamballerie
- UMR “Unité des Virus Emergents”, Aix-Marseille Université-IRD 190-Inserm 1207-IHU Méditerranée Infection, Marseille, France
- APHM Public Hospitals of Marseille, Institut Hospitalo-Universitaire Méditerranée Infection, Marseille, France
| | - Ernest A Gould
- UMR “Unité des Virus Emergents”, Aix-Marseille Université-IRD 190-Inserm 1207-IHU Méditerranée Infection, Marseille, France
| |
Collapse
|
7
|
Evolution during primary HIV infection does not require adaptive immune selection. Proc Natl Acad Sci U S A 2022; 119:2109172119. [PMID: 35145025 PMCID: PMC8851487 DOI: 10.1073/pnas.2109172119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/16/2021] [Indexed: 01/20/2023] Open
Abstract
Modern HIV research depends crucially on both viral sequencing and population measurements. To directly link mechanistic biological processes and evolutionary dynamics during HIV infection, we developed multiple within-host phylodynamic models of HIV primary infection for comparative validation against viral load and evolutionary dynamics data. The optimal model of primary infection required no positive selection, suggesting that the host adaptive immune system reduces viral load but surprisingly does not drive observed viral evolution. Rather, the fitness (infectivity) of mutant variants is drawn from an exponential distribution in which most variants are slightly less infectious than their parents (nearly neutral evolution). This distribution was not largely different from either in vivo fitness distributions recorded beyond primary infection or in vitro distributions that are observed without adaptive immunity, suggesting the intrinsic viral fitness distribution may drive evolution. Simulated phylogenetic trees also agree with independent data and illuminate how phylogenetic inference must consider viral and immune-cell population dynamics to gain accurate mechanistic insights.
Collapse
|
8
|
Lytras S, Hughes J, Martin D, Swanepoel P, de Klerk A, Lourens R, Kosakovsky Pond SL, Xia W, Jiang X, Robertson DL. Exploring the Natural Origins of SARS-CoV-2 in the Light of Recombination. Genome Biol Evol 2022; 14:evac018. [PMID: 35137080 PMCID: PMC8882382 DOI: 10.1093/gbe/evac018] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/26/2022] [Indexed: 11/19/2022] Open
Abstract
The lack of an identifiable intermediate host species for the proximal animal ancestor of SARS-CoV-2, and the large geographical distance between Wuhan and where the closest evolutionary related coronaviruses circulating in horseshoe bats (members of the Sarbecovirus subgenus) have been identified, is fueling speculation on the natural origins of SARS-CoV-2. We performed a comprehensive phylogenetic study on SARS-CoV-2 and all the related bat and pangolin sarbecoviruses sampled so far. Determining the likely recombination events reveals a highly reticulate evolutionary history within this group of coronaviruses. Distribution of the inferred recombination events is nonrandom with evidence that Spike, the main target for humoral immunity, is beside a recombination hotspot likely driving antigenic shift events in the ancestry of bat sarbecoviruses. Coupled with the geographic ranges of their hosts and the sampling locations, across southern China, and into Southeast Asia, we confirm that horseshoe bats, Rhinolophus, are the likely reservoir species for the SARS-CoV-2 progenitor. By tracing the recombinant sequence patterns, we conclude that there has been relatively recent geographic movement and cocirculation of these viruses' ancestors, extending across their bat host ranges in China and Southeast Asia over the last 100 years. We confirm that a direct proximal ancestor to SARS-CoV-2 has not yet been sampled, since the closest known relatives collected in Yunnan shared a common ancestor with SARS-CoV-2 approximately 40 years ago. Our analysis highlights the need for dramatically more wildlife sampling to: 1) pinpoint the exact origins of SARS-CoV-2's animal progenitor, 2) the intermediate species that facilitated transmission from bats to humans (if there is one), and 3) survey the extent of the diversity in the related sarbecoviruses' phylogeny that present high risk for future spillovers.
Collapse
Affiliation(s)
- Spyros Lytras
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Joseph Hughes
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| | - Darren Martin
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, South Africa
| | - Phillip Swanepoel
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, South Africa
| | - Arné de Klerk
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, South Africa
| | - Rentia Lourens
- Division of Neurosurgery, Department of Surgery, Neuroscience Institute, University of Cape Town, South Africa
| | | | - Wei Xia
- National School of Agricultural Institution and Development, South China Agricultural University, Guangzhou, China
| | - Xiaowei Jiang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University (XJTLU), Suzhou, China
| | - David L Robertson
- MRC-University of Glasgow Centre for Virus Research, Glasgow, United Kingdom
| |
Collapse
|
9
|
Hamelin DJ, Fournelle D, Grenier JC, Schockaert J, Kovalchik KA, Kubiniok P, Mostefai F, Duquette JD, Saab F, Sirois I, Smith MA, Pattijn S, Soudeyns H, Decaluwe H, Hussin J, Caron E. The mutational landscape of SARS-CoV-2 variants diversifies T cell targets in an HLA-supertype-dependent manner. Cell Syst 2021; 13:143-157.e3. [PMID: 34637888 PMCID: PMC8492600 DOI: 10.1016/j.cels.2021.09.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 06/03/2021] [Accepted: 09/23/2021] [Indexed: 02/09/2023]
Abstract
The rapid, global dispersion of SARS-CoV-2 has led to the emergence of a diverse range of variants. Here, we describe how the mutational landscape of SARS-CoV-2 has shaped HLA-restricted T cell immunity at the population level during the first year of the pandemic. We analyzed a total of 330,246 high-quality SARS-CoV-2 genome assemblies, sampled across 143 countries and all major continents from December 2019 to December 2020 before mass vaccination or the rise of the Delta variant. We observed that proline residues are preferentially removed from the proteome of prevalent mutants, leading to a predicted global loss of SARS-CoV-2 T cell epitopes in individuals expressing HLA-B alleles of the B7 supertype family; this is largely driven by a dominant C-to-U mutation type at the RNA level. These results indicate that B7-supertype-associated epitopes, including the most immunodominant ones, were more likely to escape CD8+ T cell immunosurveillance during the first year of the pandemic.
Collapse
Affiliation(s)
| | - Dominique Fournelle
- Montreal Heart Institute, Department of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Jean-Christophe Grenier
- Montreal Heart Institute, Department of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Jana Schockaert
- ImmunXperts, a Nexelis Group Company, 6041 Gosselies, Belgium
| | | | - Peter Kubiniok
- CHU Sainte-Justine Research Center, Montréal, QC, Canada
| | - Fatima Mostefai
- Montreal Heart Institute, Department of Medicine, Université de Montréal, Montréal, QC, Canada
| | | | - Frederic Saab
- CHU Sainte-Justine Research Center, Montréal, QC, Canada
| | | | - Martin A Smith
- CHU Sainte-Justine Research Center, Montréal, QC, Canada; Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Sofie Pattijn
- ImmunXperts, a Nexelis Group Company, 6041 Gosselies, Belgium
| | - Hugo Soudeyns
- CHU Sainte-Justine Research Center, Montréal, QC, Canada; Department of Microbiology, Infectiology and Immunology, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada; Department of Pediatrics, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Hélène Decaluwe
- CHU Sainte-Justine Research Center, Montréal, QC, Canada; Department of Pediatrics, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Julie Hussin
- Montreal Heart Institute, Department of Medicine, Université de Montréal, Montréal, QC, Canada; Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.
| | - Etienne Caron
- CHU Sainte-Justine Research Center, Montréal, QC, Canada; Department of Pathology and Cellular Biology, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.
| |
Collapse
|
10
|
Ma H, Tan TW, Ban KHK. A multi-task CNN learning model for taxonomic assignment of human viruses. BMC Bioinformatics 2021; 22:194. [PMID: 34078269 PMCID: PMC8170063 DOI: 10.1186/s12859-021-04084-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Taxonomic assignment is a key step in the identification of human viral pathogens. Current tools for taxonomic assignment from sequencing reads based on alignment or alignment-free k-mer approaches may not perform optimally in cases where the sequences diverge significantly from the reference sequences. Furthermore, many tools may not incorporate the genomic coverage of assigned reads as part of overall likelihood of a correct taxonomic assignment for a sample. RESULTS In this paper, we describe the development of a pipeline that incorporates a multi-task learning model based on convolutional neural network (MT-CNN) and a Bayesian ranking approach to identify and rank the most likely human virus from sequence reads. For taxonomic assignment of reads, the MT-CNN model outperformed Kraken 2, Centrifuge, and Bowtie 2 on reads generated from simulated divergent HIV-1 genomes and was more sensitive in identifying SARS as the closest relation in four RNA sequencing datasets for SARS-CoV-2 virus. For genomic region assignment of assigned reads, the MT-CNN model performed competitively compared with Bowtie 2 and the region assignments were used for estimation of genomic coverage that was incorporated into a naïve Bayesian network together with the proportion of taxonomic assignments to rank the likelihood of candidate human viruses from sequence data. CONCLUSIONS We have developed a pipeline that combines a novel MT-CNN model that is able to identify viruses with divergent sequences together with assignment of the genomic region, with a Bayesian approach to ranking of taxonomic assignments by taking into account both the number of assigned reads and genomic coverage. The pipeline is available at GitHub via https://github.com/MaHaoran627/CNN_Virus .
Collapse
Affiliation(s)
- Haoran Ma
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117592 Singapore, Singapore
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117592 Singapore, Singapore
- National Supercomputing Centre (NSCC), 138632 Singapore, Singapore
| | - Kenneth Hon Kim Ban
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117592 Singapore, Singapore
- National Supercomputing Centre (NSCC), 138632 Singapore, Singapore
| |
Collapse
|
11
|
Neverov AD, Popova AV, Fedonin GG, Cheremukhin EA, Klink GV, Bazykin GA. Episodic evolution of coadapted sets of amino acid sites in mitochondrial proteins. PLoS Genet 2021; 17:e1008711. [PMID: 33493156 PMCID: PMC7861529 DOI: 10.1371/journal.pgen.1008711] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 02/04/2021] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
The rate of evolution differs between protein sites and changes with time. However, the link between these two phenomena remains poorly understood. Here, we design a phylogenetic approach for distinguishing pairs of amino acid sites that evolve concordantly, i.e., such that substitutions at one site trigger subsequent substitutions at the other; and also pairs of sites that evolve discordantly, so that substitutions at one site impede subsequent substitutions at the other. We distinguish groups of amino acid sites that undergo coordinated evolution and evolve discordantly from other such groups. In mitochondrion-encoded proteins of metazoans and fungi, we show that concordantly evolving sites are clustered in protein structures. By analysing the phylogenetic patterns of substitutions at concordantly and discordantly evolving site pairs, we find that concordant evolution has two distinct causes: epistatic interactions between amino acid substitutions and episodes of selection independently affecting substitutions at different sites. The rate of substitutions at concordantly evolving groups of protein sites changes in the course of evolution, indicating episodes of selection limited to some of the lineages. The phylogenetic positions of these changes are consistent between proteins, suggesting common selective forces underlying them. The mode and rate of evolution of a protein site depends on the effect of its mutations on protein fitness. The fitness effect of a mutation itself can change in the course of evolution for at least two reasons. First, it can be modulated by substitutions occurring at other sites, a phenomenon called epistasis. Second, changes in selection can be non-epistatic, affecting sites independently of one another. Here, we analyse substitutions accumulated by the evolving lineages of the five proteins encoded by the mitochondrial genomes of thousands of species of metazoans and fungi. We show that substitutions at different amino acid sites occur in a coordinated fashion, and this coordination is caused both by epistasis and by episodes of selection affecting groups of sites. We partition each protein into several groups of concordantly evolving sites such that evolution of sites from different groups is discordant, and show that the proteins encoded by the mitochondrial genome consist of coevolving structural blocks. Some of these blocks have a clear functional specialization, e.g. are associated with interfaces between proteins composing respiratory complexes. Together, our results reveal a previously unrecognized complexity in the causes of variation in evolutionary rates between protein sites.
Collapse
Affiliation(s)
- Alexey D. Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- * E-mail:
| | - Anfisa V. Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
| | - Gennady G. Fedonin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia
| | | | - Galya V. Klink
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Georgii A. Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
12
|
Kistler KE, Bedford T. Evidence for adaptive evolution in the receptor-binding domain of seasonal coronaviruses OC43 and 229e. eLife 2021; 10:64509. [PMID: 33463525 PMCID: PMC7861616 DOI: 10.7554/elife.64509] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 12/12/2020] [Indexed: 11/13/2022] Open
Abstract
Seasonal coronaviruses (OC43, 229E, NL63, and HKU1) are endemic to the human population, regularly infecting and reinfecting humans while typically causing asymptomatic to mild respiratory infections. It is not known to what extent reinfection by these viruses is due to waning immune memory or antigenic drift of the viruses. Here we address the influence of antigenic drift on immune evasion of seasonal coronaviruses. We provide evidence that at least two of these viruses, OC43 and 229E, are undergoing adaptive evolution in regions of the viral spike protein that are exposed to human humoral immunity. This suggests that reinfection may be due, in part, to positively selected genetic changes in these viruses that enable them to escape recognition by the immune system. It is possible that, as with seasonal influenza, these adaptive changes in antigenic regions of the virus would necessitate continual reformulation of a vaccine made against them.
Collapse
Affiliation(s)
- Kathryn E Kistler
- Molecular and Cellular Biology Program, University of Washington, Seattle, United States.,Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, United States
| | - Trevor Bedford
- Molecular and Cellular Biology Program, University of Washington, Seattle, United States.,Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, United States
| |
Collapse
|
13
|
Computational Evolutionary Biology. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
14
|
Abstract
Organisms evolve to increase their fitness, a process that may be described as climbing the fitness landscape. However, the fitness landscape of an individual site, i.e., the vector of fitness values corresponding to different variants at this site, can itself change with time due to changes in the environment or substitutions at other epistatically interacting sites. While there exist a number of simulators for modeling different aspects of molecular evolution, very few can accommodate changing landscapes. We present SELVa, the Simulator of Evolution with Landscape Variation, aimed at modeling the substitution process under a changing single-position fitness landscape in a set of evolving lineages that form a phylogeny of arbitrary shape. Written in Java and distributed as an executable jar file, SELVa provides a flexible framework that allows the user to choose from a number of implemented rules governing landscape change.
Collapse
|
15
|
Jones BR, Joy JB. Simulating within host human immunodeficiency virus 1 genome evolution in the persistent reservoir. Virus Evol 2020; 6:veaa089. [PMID: 34040795 PMCID: PMC8132731 DOI: 10.1093/ve/veaa089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The complexities of viral evolution can be difficult to elucidate. Software simulating viral evolution provides powerful tools for exploring hypotheses of viral systems, especially in situations where thorough empirical data are difficult to obtain or parameters of interest are difficult to measure. Human immunodeficiency virus 1 (HIV-1) infection has no durable cure; this is primarily due to the virus’ ability to integrate into the genome of host cells, where it can remain in a transcriptionally latent state. An effective cure strategy must eliminate every copy of HIV-1 in this ‘persistent reservoir’ because proviruses can reactivate, even decades later, to resume an active infection. However, many features of the persistent reservoir remain unclear, including the temporal dynamics of HIV-1 integration frequency and the longevity of the resulting reservoir. Thus, sophisticated analyses are required to measure these features and determine their temporal dynamics. Here, we present software that is an extension of SANTA-SIM to include multiple compartments of viral populations. We used the resulting software to create a model of HIV-1 within host evolution that incorporates the persistent HIV-1 reservoir. This model is composed of two compartments, an active compartment and a latent compartment. With this model, we compared five different date estimation methods (Closest Sequence, Clade, Linear Regression, Least Squares, and Maximum Likelihood) to recover the integration dates of genomes in our model’s HIV-1 reservoir. We found that the Least Squares method performed the best with the highest concordance (0.80) between real and estimated dates and the lowest absolute error (all pairwise t tests: P < 0.01). Our software is a useful tool for validating bioinformatics software and understanding the dynamics of the persistent HIV-1 reservoir.
Collapse
Affiliation(s)
- Bradley R Jones
- BC Centre for Excellence in HIV/AIDS, 608-1081 Burrard Street, Vancouver, BC V6Z 1Y6, Canada
| | - Jeffrey B Joy
- BC Centre for Excellence in HIV/AIDS, 608-1081 Burrard Street, Vancouver, BC V6Z 1Y6, Canada
| |
Collapse
|
16
|
Huddleston J, Barnes JR, Rowe T, Xu X, Kondor R, Wentworth DE, Whittaker L, Ermetal B, Daniels RS, McCauley JW, Fujisaki S, Nakamura K, Kishida N, Watanabe S, Hasegawa H, Barr I, Subbarao K, Barrat-Charlaix P, Neher RA, Bedford T. Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza A/H3N2 evolution. eLife 2020; 9:e60067. [PMID: 32876050 PMCID: PMC7553778 DOI: 10.7554/elife.60067] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 08/24/2020] [Indexed: 12/17/2022] Open
Abstract
Seasonal influenza virus A/H3N2 is a major cause of death globally. Vaccination remains the most effective preventative. Rapid mutation of hemagglutinin allows viruses to escape adaptive immunity. This antigenic drift necessitates regular vaccine updates. Effective vaccine strains need to represent H3N2 populations circulating one year after strain selection. Experts select strains based on experimental measurements of antigenic drift and predictions made by models from hemagglutinin sequences. We developed a novel influenza forecasting framework that integrates phenotypic measures of antigenic drift and functional constraint with previously published sequence-only fitness estimates. Forecasts informed by phenotypic measures of antigenic drift consistently outperformed previous sequence-only estimates, while sequence-only estimates of functional constraint surpassed more comprehensive experimentally-informed estimates. Importantly, the best models integrated estimates of both functional constraint and either antigenic drift phenotypes or recent population growth.
Collapse
Affiliation(s)
- John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
- Molecular and Cell Biology Program, University of WashingtonSeattleUnited States
| | - John R Barnes
- Virology Surveillance and Diagnosis Branch, Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention (CDC)AtlantaUnited States
| | - Thomas Rowe
- Virology Surveillance and Diagnosis Branch, Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention (CDC)AtlantaUnited States
| | - Xiyan Xu
- Virology Surveillance and Diagnosis Branch, Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention (CDC)AtlantaUnited States
| | - Rebecca Kondor
- Virology Surveillance and Diagnosis Branch, Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention (CDC)AtlantaUnited States
| | - David E Wentworth
- Virology Surveillance and Diagnosis Branch, Influenza Division, National Center for Immunization and Respiratory Diseases (NCIRD), Centers for Disease Control and Prevention (CDC)AtlantaUnited States
| | - Lynne Whittaker
- WHO Collaborating Centre for Reference and Research on Influenza, Crick Worldwide Influenza Centre, The Francis Crick InstituteLondonUnited Kingdom
| | - Burcu Ermetal
- WHO Collaborating Centre for Reference and Research on Influenza, Crick Worldwide Influenza Centre, The Francis Crick InstituteLondonUnited Kingdom
| | - Rodney Stuart Daniels
- WHO Collaborating Centre for Reference and Research on Influenza, Crick Worldwide Influenza Centre, The Francis Crick InstituteLondonUnited Kingdom
| | - John W McCauley
- WHO Collaborating Centre for Reference and Research on Influenza, Crick Worldwide Influenza Centre, The Francis Crick InstituteLondonUnited Kingdom
| | - Seiichiro Fujisaki
- Influenza Virus Research Center, National Institute of Infectious DiseasesTokyoJapan
| | - Kazuya Nakamura
- Influenza Virus Research Center, National Institute of Infectious DiseasesTokyoJapan
| | - Noriko Kishida
- Influenza Virus Research Center, National Institute of Infectious DiseasesTokyoJapan
| | - Shinji Watanabe
- Influenza Virus Research Center, National Institute of Infectious DiseasesTokyoJapan
| | - Hideki Hasegawa
- Influenza Virus Research Center, National Institute of Infectious DiseasesTokyoJapan
| | - Ian Barr
- The WHO Collaborating Centre for Reference and Research on Influenza, The Peter Doherty Institute for Infection and Immunity, Department of Microbiology and Immunology, The University of Melbourne, The Peter Doherty Institute for Infection and ImmunityMelbourneAustralia
| | - Kanta Subbarao
- The WHO Collaborating Centre for Reference and Research on Influenza, The Peter Doherty Institute for Infection and Immunity, Department of Microbiology and Immunology, The University of Melbourne, The Peter Doherty Institute for Infection and ImmunityMelbourneAustralia
| | - Pierre Barrat-Charlaix
- Biozentrum, University of BaselBaselSwitzerland
- Swiss Institute of BioinformaticsBaselSwitzerland
| | - Richard A Neher
- Biozentrum, University of BaselBaselSwitzerland
- Swiss Institute of BioinformaticsBaselSwitzerland
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
| |
Collapse
|
17
|
Lequime S, Bastide P, Dellicour S, Lemey P, Baele G. nosoi: A stochastic agent-based transmission chain simulation framework in r. Methods Ecol Evol 2020; 11:1002-1007. [PMID: 32983401 PMCID: PMC7496779 DOI: 10.1111/2041-210x.13422] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 05/13/2020] [Indexed: 12/22/2022]
Abstract
The transmission process of an infectious agent creates a connected chain of hosts linked by transmission events, known as a transmission chain. Reconstructing transmission chains remains a challenging endeavour, except in rare cases characterized by intense surveillance and epidemiological inquiry. Inference frameworks attempt to estimate or approximate these transmission chains but the accuracy and validity of such methods generally lack formal assessment on datasets for which the actual transmission chain was observed.We here introduce nosoi, an open-source r package that offers a complete, tunable and expandable agent-based framework to simulate transmission chains under a wide range of epidemiological scenarios for single-host and dual-host epidemics. nosoi is accessible through GitHub and CRAN, and is accompanied by extensive documentation, providing help and practical examples to assist users in setting up their own simulations.Once infected, each host or agent can undergo a series of events during each time step, such as moving (between locations) or transmitting the infection, all of these being driven by user-specified rules or data, such as travel patterns between locations. nosoi is able to generate a multitude of epidemic scenarios, that can-for example-be used to validate a wide range of reconstruction methods, including epidemic modelling and phylodynamic analyses. nosoi also offers a comprehensive framework to leverage empirically acquired data, allowing the user to explore how variations in parameters can affect epidemic potential. Aside from research questions, nosoi can provide lecturers with a complete teaching tool to offer students a hands-on exploration of the dynamics of epidemiological processes and the factors that impact it. Because the package does not rely on mathematical formalism but uses a more intuitive algorithmic approach, even extensive changes of the entire model can be easily and quickly implemented.
Collapse
Affiliation(s)
- Sebastian Lequime
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- Cluster of Microbial EcologyGroningen Institute for Evolutionary Life SciencesUniversity of GroningenGroningenThe Netherlands
| | - Paul Bastide
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- IMAGCNRSUniversity of MontpellierMontpellierFrance
| | - Simon Dellicour
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- Spatial Epidemiology Lab (SpELL)Université Libre de BruxellesBrusselsBelgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
| | - Guy Baele
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
| |
Collapse
|