1
|
Ma W, Chaisson M. Genotyping sequence-resolved copy number variation using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.08.11.607269. [PMID: 39149335 PMCID: PMC11326217 DOI: 10.1101/2024.08.11.607269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Copy number variant (CNV) genes are important in evolution and disease, yet sequence variation in CNV genes remains a blind spot in large-scale studies. We present ctyper, a method that leverages pangenomes to produce allele-specific copy numbers with locally phased variants from next-generation sequencing (NGS) reads. Benchmarking on 3,351 CNV genes, including HLA, SMN, and CYP2D6, and 212 challenging medically relevant (CMR) genes that are poorly mapped by NGS, ctyper captures 96.5% of phased variants with ≥99.1% correctness of copy number on CNV genes and 94.8% of phased variants on CMR genes. Applying alignment-free algorithms, ctyper requires 1.5 hours per genome on a single CPU. The results improve prediction of gene expression compared to known expression quantitative trait loci (eQTL) variants. Allele-specific expression quantified divergent expression on 7.94% of paralogs and tissue-specific biases on 4.68% of paralogs. We found reduced expression of SMN-2 due to SMN1 conversion, potentially affecting spinal muscular atrophy, and increased expression of translocated duplications of AMY2B. Overall, ctyper enables biobank-scale genotyping of CNV and CMR genes.
Collapse
|
2
|
Zhou Q, Ghezelji M, Hari A, Ford MKB, Holley C, Sahinalp SC, Numanagić I. Geny: a genotyping tool for allelic decomposition of killer cell immunoglobulin-like receptor genes. Front Immunol 2024; 15:1494995. [PMID: 39763645 PMCID: PMC11701374 DOI: 10.3389/fimmu.2024.1494995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 11/29/2024] [Indexed: 01/15/2025] Open
Abstract
Introduction Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the generic genotyping workflows are unable to accurately infer copy numbers and complete genotypes of individual KIR genes from next-generation sequencing data. Thus, specialized genotyping tools are needed to genotype this complex region. Methods Here, we introduce Geny, a new computational tool for precise genotyping of KIR genes. Geny utilizes available KIR allele databases and proposes a novel combination of expectation-maximization filtering schemes and integer linear programming-based combinatorial optimization models to resolve ambiguous reads, provide accurate copy number estimation, and estimate the correct allele of each copy of genes within the KIR region. Results & Discussion We evaluated Geny on a large set of simulated short-read datasets covering the known validated KIR region assemblies and a set of Illumina short-read samples sequenced from 40 validated samples from the Human Pangenome Reference Consortium collection and showed that it outperforms the existing state-of-the-art KIR genotyping tools in terms of accuracy, precision, and recall. We envision Geny becoming a valuable resource for understanding immune system response and consequently advancing the field of patient-centric medicine.
Collapse
Affiliation(s)
- Qinghui Zhou
- Department of Computer Science, University of Victoria, Victoria, BC, Canada
| | - Mazyar Ghezelji
- Department of Computer Science, University of Victoria, Victoria, BC, Canada
| | - Ananth Hari
- Department of Electrical Engineering, University of Maryland, College Park, MD, United States
- National Cancer Institute, NIH, Bethesda, MD, United States
| | | | - Connor Holley
- Department of Computer Science, University of Victoria, Victoria, BC, Canada
| | | | - Ibrahim Numanagić
- Department of Computer Science, University of Victoria, Victoria, BC, Canada
| |
Collapse
|
3
|
Ford MKB, Hari A, Zhou Q, Numanagić I, Sahinalp SC. Biologically-informed killer cell immunoglobulin-like receptor gene annotation tool. Bioinformatics 2024; 40:btae622. [PMID: 39432666 PMCID: PMC11549020 DOI: 10.1093/bioinformatics/btae622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 10/16/2024] [Indexed: 10/23/2024] Open
Abstract
SUMMARY Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This article introduces BAKIR (Biologically informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community. AVAILABILITY AND IMPLEMENTATION BAKIR is available at github.com/algo-cancer/bakir.
Collapse
Affiliation(s)
- Michael K B Ford
- National Cancer Institute, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, United States
| | - Ananth Hari
- National Cancer Institute, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, United States
- Department of Electrical Engineering, University of Maryland, 2410 A.V. Williams Building, College Park, MD, 20742, United States
| | - Qinghui Zhou
- Faculty of Engineering and Computer Science, University of Victoria, 3800 Finnerty Rd, Victoria, BC, V8P 5C2, Canada
| | - Ibrahim Numanagić
- Faculty of Engineering and Computer Science, University of Victoria, 3800 Finnerty Rd, Victoria, BC, V8P 5C2, Canada
| | - S Cenk Sahinalp
- National Cancer Institute, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, United States
| |
Collapse
|
4
|
Ford MK, Hari A, Zhou Q, Numanagić I, Sahinalp SC. Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.13.607835. [PMID: 39372800 PMCID: PMC11451589 DOI: 10.1101/2024.08.13.607835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.
Collapse
Affiliation(s)
| | - Ananth Hari
- National Cancer Institute, NIH, Bethesda, MD, USA
- University of Maryland, College Park, MD, USA
| | | | | | | |
Collapse
|
5
|
Zhou Q, Ghezelji M, Hari A, Ford MKB, Holley C, Mirabello L, Chanock S, Sahinalp SC, Numanagić I. Geny: A Genotyping Tool for Allelic Decomposition of Killer Cell Immunoglobulin-Like Receptor Genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582413. [PMID: 38529502 PMCID: PMC10962708 DOI: 10.1101/2024.02.27.582413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the currently available genotyping methods are unable to accurately infer copy numbers, genotypes and haplotypes of individual KIR genes from next-generation sequencing data. Here we introduce Geny, a new computational tool for precise genotyping of KIR genes. Geny utilizes available KIR haplotype databases and proposes a novel combination of expectation-maximization filtering schemes and integer linear programming-based combinatorial optimization models to resolve ambiguous reads, provide accurate copy number estimation and estimate the haplotype of each copy for the genes within the KIR region. We evaluated Geny on a large set of simulated short-read datasets covering the known validated KIR region assemblies and a set of Illumina short-read samples sequenced from 25 validated samples from the Human Pangenome Reference Consortium collection and showed that it outperforms the existing genotyping tools in terms of accuracy, precision and recall. We envision Geny becoming a valuable resource for understanding immune system response and consequently advancing the field of patient-centric medicine.
Collapse
|
6
|
Dhande IS, Zhu Y, Joshi AS, Hicks MJ, Braun MC, Doris PA. Polygenic genetic variation affecting antibody formation underlies hypertensive renal injury in the stroke-prone spontaneously hypertensive rat. Am J Physiol Renal Physiol 2023; 325:F317-F327. [PMID: 37439198 PMCID: PMC10511163 DOI: 10.1152/ajprenal.00058.2023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/07/2023] [Accepted: 07/07/2023] [Indexed: 07/14/2023] Open
Abstract
During development of the spontaneously hypertensive rat (SHR), several distinct but closely related lines were generated. Most lines are resistant to hypertensive renal disease. However, the SHR-A3 line (stroke-prone SHR) experiences end-organ injury (EOI) and provides a model of injury susceptibility that can be used to uncover genetic causation. In the present study, we generated a congenic line in which three distinct disease loci in SHR-A3 are concurrently replaced with homologous loci from an injury-resistant SHR line (SHR-B2). Verification that all three loci were homozygously replaced in this triple congenic line [SHR-A3(Trip B2)] while the genetic background of SHR-A3 was fully retained was obtained by whole genome sequencing. Congenic genome substitution was without effect on systolic blood pressure [198.9 ± 3.34 mmHg, mean ± SE, SHR-A3(Trip B2) = 194.7 ± 2.55 mmHg]. Measures of renal injury (albuminuria, histological injury scores, and urinary biomarker levels) were reduced in SHR-A3(Trip B2) animals, even though only 4.5 Mbases of the 2.8 Gbases of the SHR-B2 genome (0.16% of the genome) was transferred into the congenic line. The gene content of the three congenic loci and the functional effects of gene polymorphism within suggest a role of immunoglobulin in EOI pathogenesis. To prove the role of antibodies in EOI in SHR-A3, we generated an SHR-A3 line in which expression from the immunoglobulin heavy chain gene was knocked out (SHR-A3-IGHKO). Animals in the SHR-A3-IGHKO line lack B cells and immunoglobulin, but the hypertensive phenotype is not affected. Renal injury, however, was reduced in this line, confirming a pathogenic role for immunoglobulin in hypertensive EOI in this model of heritable risk.NEW & NOTEWORTHY Here, we used a polygenic animal model of hypertensive renal disease to show that genetic variation affecting antibody formation underlies hypertensive renal disease. We proved the genetic thesis by generating an immunoglobulin knockout in the susceptible animal model.
Collapse
Affiliation(s)
- Isha S Dhande
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, Texas, United States
| | - Yaming Zhu
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, Texas, United States
| | - Aniket S Joshi
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, Texas, United States
| | - M John Hicks
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas, United States
| | - Michael C Braun
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas, United States
- Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, Texas, United States
| | - Peter A Doris
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, Texas, United States
| |
Collapse
|