1
|
Sung K, Johnson MM, Dumm W, Simon N, Haddox H, Fukuyama J, Matsen FA. Thrifty wide-context models of B cell receptor somatic hypermutation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.26.625407. [PMID: 39651125 PMCID: PMC11623647 DOI: 10.1101/2024.11.26.625407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Somatic hypermutation (SHM) is the diversity-generating process in antibody affinity maturation. Probabilistic models of SHM are needed for analyzing rare mutations, for understanding the selective forces guiding affinity maturation, and for understanding the underlying biochemical process. High throughput data offers the potential to develop and fit models of SHM on relevant data sets. In this paper we model SHM using modern frameworks. We are motivated by recent work suggesting the importance of a wider context for SHM, however, assigning an independent rate to each k-mer leads to an exponential proliferation of parameters. Thus, using convolutions on 3-mer embeddings, we develop "thrifty" models of SHM that have fewer free parameters than a 5-mer model and yet have a significantly wider context. These offer a slight performance improvement over a 5-mer model. We also find that a per-site effect is not necessary to explain SHM patterns given nucleotide context. Also, the two current methods for fitting an SHM model - on out-of-frame sequence data and on synonymous mutations - produce significantly different results, and augmenting out-of-frame data with synonymous mutations does not aid out-of-sample performance.
Collapse
|
2
|
Casoli T, Bonfigli AR, Di Rosa M, Giorgetti B, Balietti M, Giacconi R, Cardelli M, Piacenza F, Marchegiani F, Marcheselli F, Recchioni R, Galeazzi R, Vaiasicca S, Lamedica AM, Fumagalli A, Ferrara L, Lattanzio F. Association of Inflammatory Mediators with Mitochondrial DNA Variants in Geriatric COVID-19 Patients. Aging Dis 2024; 15:2665-2681. [PMID: 38377022 PMCID: PMC11567249 DOI: 10.14336/ad.2023.1123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 11/23/2023] [Indexed: 02/22/2024] Open
Abstract
COVID-19 remains a serious concern for elderly individuals with underlying comorbidities. SARS-CoV-2 can target and damage mitochondria, potentially leading to mutations in mitochondrial DNA (mtDNA). This study aimed to evaluate single nucleotide substitutions in mtDNA and analyze their correlation with inflammatory biomarkers in elderly COVID-19 patients. A total of 30 COVID-19 patients and 33 older adult controls without COVID-19 (aged over 65 years) were enrolled. mtDNA was extracted from buffy coat samples and sequenced using a chip-based resequencing system (MitoChip v2.0) which detects both homoplasmic and heteroplasmic mtDNA variants (40-60% heteroplasmy) and allows the assessment of low-level heteroplasmy (<10% heteroplasmy). Serum concentrations of IL-6, IFN-α, TNF-α and IL-10 were determined in patients by a high-sensitivity immunoassay. We found a higher burden of total heteroplasmic variants in COVID-19 patients compared to controls with a selective increment in ND1 and COIII genes. Low-level heteroplasmy was significantly elevated in COVID-19 patients, especially in genes of the respiratory complex I. Both heteroplasmic variant burden and low-level heteroplasmy were associated with increased levels of IL-6, TNF-α, and IFN-α. These findings suggest that SARS-CoV-2 may induce mtDNA mutations that are related to the degree of inflammation.
Collapse
Affiliation(s)
- Tiziana Casoli
- Center for Neurobiology of Aging, IRCCS INRCA, Ancona, Italy.
| | | | - Mirko Di Rosa
- Centre for Biostatistics and Applied Geriatric Clinical Epidemiology, IRCCS INRCA, Ancona, Italy.
| | | | - Marta Balietti
- Center for Neurobiology of Aging, IRCCS INRCA, Ancona, Italy.
| | - Robertina Giacconi
- Advanced Technology Center for Aging Research, IRCCS INRCA, Ancona, Italy.
| | - Maurizio Cardelli
- Advanced Technology Center for Aging Research, IRCCS INRCA, Ancona, Italy.
| | - Francesco Piacenza
- Advanced Technology Center for Aging Research, IRCCS INRCA, Ancona, Italy.
| | | | | | - Rina Recchioni
- Clinic of Laboratory and Precision Medicine, IRCCS INRCA, Ancona, Italy.
| | - Roberta Galeazzi
- Clinic of Laboratory and Precision Medicine, IRCCS INRCA, Ancona, Italy.
| | | | | | | | | | | |
Collapse
|
3
|
Levinstein Hallak K, Rosset S. Dating ancient splits in phylogenetic trees, with application to the human-Neanderthal split. BMC Genom Data 2024; 25:4. [PMID: 38166646 PMCID: PMC10759710 DOI: 10.1186/s12863-023-01185-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND We tackle the problem of estimating species TMRCAs (Time to Most Recent Common Ancestor), given a genome sequence from each species and a large known phylogenetic tree with a known structure (typically from one of the species). The number of transitions at each site from the first sequence to the other is assumed to be Poisson distributed, and only the parity of the number of transitions is observed. The detailed phylogenetic tree contains information about the transition rates in each site. We use this formulation to develop and analyze multiple estimators of the species' TMRCA. To test our methods, we use mtDNA substitution statistics from the well-established Phylotree as a baseline for data simulation such that the substitution rate per site mimics the real-world observed rates. RESULTS We evaluate our methods using simulated data and compare them to the Bayesian optimizing software BEAST2, showing that our proposed estimators are accurate for a wide range of TMRCAs and significantly outperform BEAST2. We then apply the proposed estimators on Neanderthal, Denisovan, and Chimpanzee mtDNA genomes to better estimate their TMRCA with modern humans and find that their TMRCA is substantially later, compared to values cited recently in the literature. CONCLUSIONS Our methods utilize the transition statistics from the entire known human mtDNA phylogenetic tree (Phylotree), eliminating the requirement to reconstruct a tree encompassing the specific sequences of interest. Moreover, they demonstrate notable improvement in both running speed and accuracy compared to BEAST2, particularly for earlier TMRCAs like the human-Chimpanzee split. Our results date the human - Neanderthal TMRCA to be [Formula: see text] years ago, considerably later than values cited in other recent studies.
Collapse
Affiliation(s)
- Keren Levinstein Hallak
- Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, 6997801, Israel
| | - Saharon Rosset
- Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, 6997801, Israel.
| |
Collapse
|
4
|
Hong YS, Battle SL, Shi W, Puiu D, Pillalamarri V, Xie J, Pankratz N, Lake NJ, Lek M, Rotter JI, Rich SS, Kooperberg C, Reiner AP, Auer PL, Heard-Costa N, Liu C, Lai M, Murabito JM, Levy D, Grove ML, Alonso A, Gibbs R, Dugan-Perez S, Gondek LP, Guallar E, Arking DE. Deleterious heteroplasmic mitochondrial mutations are associated with an increased risk of overall and cancer-specific mortality. Nat Commun 2023; 14:6113. [PMID: 37777527 PMCID: PMC10542802 DOI: 10.1038/s41467-023-41785-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/14/2023] [Indexed: 10/02/2023] Open
Abstract
Mitochondria carry their own circular genome and disruption of the mitochondrial genome is associated with various aging-related diseases. Unlike the nuclear genome, mitochondrial DNA (mtDNA) can be present at 1000 s to 10,000 s copies in somatic cells and variants may exist in a state of heteroplasmy, where only a fraction of the DNA molecules harbors a particular variant. We quantify mtDNA heteroplasmy in 194,871 participants in the UK Biobank and find that heteroplasmy is associated with a 1.5-fold increased risk of all-cause mortality. Additionally, we functionally characterize mtDNA single nucleotide variants (SNVs) using a constraint-based score, mitochondrial local constraint score sum (MSS) and find it associated with all-cause mortality, and with the prevalence and incidence of cancer and cancer-related mortality, particularly leukemia. These results indicate that mitochondria may have a functional role in certain cancers, and mitochondrial heteroplasmic SNVs may serve as a prognostic marker for cancer, especially for leukemia.
Collapse
Affiliation(s)
- Yun Soo Hong
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephanie L Battle
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Natural Sciences, College of Arts and Sciences, Bowie State University, Bowie, MD, USA
| | - Wen Shi
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Vamsee Pillalamarri
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jiaqi Xie
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Nicole J Lake
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
| | - Monkol Lek
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Alex P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health & Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Nancy Heard-Costa
- Departments of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Chunyu Liu
- Framingham Heart Study, Framingham, MA, USA
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
| | - Meng Lai
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
| | - Joanne M Murabito
- Section of General Internal Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Daniel Levy
- National Heart, Lung, and Blood Institute, NIH, Bethesda, MD, USA
| | - Megan L Grove
- Human Genetics Center; Department of Epidemiology, Human Genetics, and Environmental Sciences; School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Alvaro Alonso
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Richard Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Shannon Dugan-Perez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Lukasz P Gondek
- Division of Hematological Malignancies, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Eliseo Guallar
- Department of Epidemiology and Medicine, and Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA
| | - Dan E Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
5
|
Rubin JD, Vogel NA, Gopalakrishnan S, Sackett PW, Renaud G. HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph human mtDNA haplogroup inference. PLoS Comput Biol 2023; 19:e1011148. [PMID: 37285390 DOI: 10.1371/journal.pcbi.1011148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/02/2023] [Indexed: 06/09/2023] Open
Abstract
Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment.
Collapse
Affiliation(s)
- Joshua Daniel Rubin
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Nicola Alexandra Vogel
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Peter Wad Sackett
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Gabriel Renaud
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
6
|
Statistical modeling of SARS-CoV-2 substitution processes: predicting the next variant. Commun Biol 2022; 5:285. [PMID: 35351970 PMCID: PMC8964801 DOI: 10.1038/s42003-022-03198-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 02/24/2022] [Indexed: 12/14/2022] Open
Abstract
We build statistical models to describe the substitution process in the SARS-CoV-2 as a function of explanatory factors describing the sequence, its function, and more. These models serve two different purposes: first, to gain knowledge about the evolutionary biology of the virus; and second, to predict future mutations in the virus, in particular, non-synonymous amino acid substitutions creating new variants. We use tens of thousands of publicly available SARS-CoV-2 sequences and consider tens of thousands of candidate models. Through a careful validation process, we confirm that our chosen models are indeed able to predict new amino acid substitutions: candidates ranked high by our model are eight times more likely to occur than random amino acid changes. We also show that named variants were highly ranked by our models before their appearance, emphasizing the value of our models for identifying likely variants and potentially utilizing this knowledge in vaccine design and other aspects of the ongoing battle against COVID-19. As the virus that causes COVID-19 continues to mutate and spread, new methods are needed to predict new potential variants. Here, the authors identify the best regression models for predicting likely mutation sites in the SARS-CoV-2 genome using a candidate set that considers sequence, gene location, and biological function.
Collapse
|