1
|
Xiang Z, Liu Z, Dinh KN. Inference of chromosome selection parameters and missegregation rate in cancer from DNA-sequencing data. Sci Rep 2024; 14:17699. [PMID: 39085295 PMCID: PMC11291923 DOI: 10.1038/s41598-024-67842-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open
Abstract
Aneuploidy is frequently observed in cancers and has been linked to poor patient outcome. Analysis of aneuploidy in DNA-sequencing (DNA-seq) data necessitates untangling the effects of the Copy Number Aberration (CNA) occurrence rates and the selection coefficients that act upon the resulting karyotypes. We introduce a parameter inference algorithm that takes advantage of both bulk and single-cell DNA-seq cohorts. The method is based on Approximate Bayesian Computation (ABC) and utilizes CINner, our recently introduced simulation algorithm of chromosomal instability in cancer. We examine three groups of statistics to summarize the data in the ABC routine: (A) Copy Number-based measures, (B) phylogeny tip statistics, and (C) phylogeny balance indices. Using these statistics, our method can recover both the CNA probabilities and selection parameters from ground truth data, and performs well even for data cohorts of relatively small sizes. We find that only statistics in groups A and C are well-suited for identifying CNA probabilities, and only group A carries the signals for estimating selection parameters. Moreover, the low number of CNA events at large scale compared to cell counts in single-cell samples means that statistics in group B cannot be estimated accurately using phylogeny reconstruction algorithms at the chromosome level. As data from both bulk and single-cell DNA-sequencing techniques becomes increasingly available, our inference framework promises to facilitate the analysis of distinct cancer types, differentiation between selection and neutral drift, and prediction of cancer clonal dynamics.
Collapse
Affiliation(s)
- Zijin Xiang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Zhihan Liu
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Khanh N Dinh
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA.
| |
Collapse
|
2
|
Whitlock AOB, Bird BH, Ghersi B, Davison AJ, Hughes J, Nichols J, Vučak M, Amara E, Bangura J, Lavalie EG, Kanu MC, Kanu OT, Sjodin A, Remien CH, Nuismer SL. Identifying the genetic basis of viral spillover using Lassa virus as a test case. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221503. [PMID: 36968239 PMCID: PMC10031424 DOI: 10.1098/rsos.221503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
The rate at which zoonotic viruses spill over into the human population varies significantly over space and time. Remarkably, we do not yet know how much of this variation is attributable to genetic variation within viral populations. This gap in understanding arises because we lack methods of genetic analysis that can be easily applied to zoonotic viruses, where the number of available viral sequences is often limited, and opportunistic sampling introduces significant population stratification. Here, we explore the feasibility of using patterns of shared ancestry to correct for population stratification, enabling genome-wide association methods to identify genetic substitutions associated with spillover into the human population. Using a combination of phylogenetically structured simulations and Lassa virus sequences collected from humans and rodents in Sierra Leone, we demonstrate that existing methods do not fully correct for stratification, leading to elevated error rates. We also demonstrate, however, that the Type I error rate can be substantially reduced by confining the analysis to a less-stratified region of the phylogeny, even in an already-small dataset. Using this method, we detect two candidate single-nucleotide polymorphisms associated with spillover in the Lassa virus polymerase gene and provide generalized recommendations for the collection and analysis of zoonotic viruses.
Collapse
Affiliation(s)
| | - Brian H. Bird
- One Health Institute, School of Veterinary Medicine, University of California, Davis, Davis, CA, USA
| | - Bruno Ghersi
- One Health Institute, School of Veterinary Medicine, University of California, Davis, Davis, CA, USA
| | | | - Joseph Hughes
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Jenna Nichols
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Matej Vučak
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Emmanuel Amara
- University of Makeni and University of California, Davis One Health Program, Makeni, Sierra Leone
| | - James Bangura
- University of Makeni and University of California, Davis One Health Program, Makeni, Sierra Leone
| | - Edwin G. Lavalie
- University of Makeni and University of California, Davis One Health Program, Makeni, Sierra Leone
| | - Marilyn C. Kanu
- University of Makeni and University of California, Davis One Health Program, Makeni, Sierra Leone
| | - Osman T. Kanu
- University of Makeni and University of California, Davis One Health Program, Makeni, Sierra Leone
| | - Anna Sjodin
- Department of Biological Sciences, University of Idaho, Moscow, ID, USA
| | - Christopher H. Remien
- Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, USA
| | - Scott L. Nuismer
- Department of Biological Sciences, University of Idaho, Moscow, ID, USA
| |
Collapse
|
3
|
Barzilai LP, Schrago CG. Signatures of natural selection in tree topology shape of serially sampled viral phylogenies. Mol Phylogenet Evol 2023; 183:107776. [PMID: 36990305 DOI: 10.1016/j.ympev.2023.107776] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/24/2023] [Accepted: 03/24/2023] [Indexed: 03/29/2023]
Abstract
Tree shape metrics can be computed fast for trees of any size, which makes them promising alternatives to intensive statistical methods and parameter-rich evolutionary models in the era of massive data availability. Previous studies have demonstrated their effectiveness in unveiling important parameters in viral evolutionary dynamics, although the impact of natural selection on the shape of tree topologies has not been thoroughly investigated. We carried out a forward-time and individual-based simulation to investigate whether tree shape metrics of several kinds could predict the selection regime employed to generate the data. To examine the impact of the genetic diversity of the founder viral population, simulations were run under two opposing starting configurations of the genetic diversity of the infecting viral population. We found that four evolutionary regimes, namely, negative, positive, and frequency-dependent selection, as well as neutral evolution, were successfully distinguished by tree topology shape metrics. Two metrics from the Laplacian spectral density profile (principal eigenvalue and peakedness) and the number of cherries were the most informative for indicating selection type. The genetic diversity of the founder population had an impact on differentiating evolutionary scenarios. Tree imbalance, which has been frequently associated with the action of natural selection on intrahost viral diversity, was also characteristic of neutrally evolving serially sampled data. Metrics calculated from empirical analysis of HIV datasets indicated that most tree topologies exhibited shapes closer to the frequency-dependent selection or neutral evolution regimes.
Collapse
|
4
|
Giovanni MY, Whalen C, Hurt DE, Ware-Allen L, Noble K, McCarthy M, Quinones M, Cruz P, Jjingo D, Wele M, Seydou D, Tartakovsky M. African Centers of Excellence in Bioinformatics and Data Intensive Science: Building Capacity for Enhancing Data Intensive Infectious Diseases Research in Africa. JOURNAL OF INFECTIOUS DISEASES & MICROBIOLOGY 2023; 1:006. [PMID: 37987019 PMCID: PMC10658664 DOI: 10.37191/mapsci-jidm-1(2)-006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Africa faces both a disproportionate burden of infectious diseases coupled with unmet needs in bioinformatics and data science capabilities which impacts the ability of African biomedical researchers to vigorously pursue research and partner with institutions in other countries. The African Centers of Excellence in Bioinformatics and Data Intensive Science are collaborating with African academic institutions, industry partners, the Foundation for the National Institutes of Health (FNIH) and the National Institute of Allergy and Infectious Diseases (NIAID) at the National Institutes of Health (NIH) in a public-private partnership to address these challenges through enhancing computational infrastructure, fostering the development of advanced bioinformatics and data science skills among local researchers and students and providing innovative emerging technologies for infectious diseases research.
Collapse
Affiliation(s)
- Maria Y Giovanni
- Office of Data Science and Emerging Technologies and Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Christopher Whalen
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Darrell E Hurt
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Latrice Ware-Allen
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Karlynn Noble
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Meghan McCarthy
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Mariam Quinones
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Phillip Cruz
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Daudi Jjingo
- Department of Computer Science, College of Computing and Information Sciences, and The African Center of Excellence in Bioinformatics and Data-Intensive Science, Infectious Disease Institute, Makerere University, Kampala, Uganda
| | - Mamadou Wele
- Institute of Applied Sciences, University of Sciences, Techniques and Technologies of Bamako, and The African Center of Excellence in Bioinformatics and Data-Intensive Science, Bamako
| | - Doumbia Seydou
- Department of Public Health, Faculty of Medicine and Odontostomatology, University of Sciences, Techniques, and Technologies of Bamako, Bamako
| | - Michael Tartakovsky
- Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|