1
|
Engelbrecht E, Rodriguez OL, Watson CT. Addressing Technical Pitfalls in Pursuit of Molecular Factors That Mediate Immunoglobulin Gene Regulation. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 213:651-662. [PMID: 39007649 PMCID: PMC11333172 DOI: 10.4049/jimmunol.2400131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/13/2024] [Indexed: 07/16/2024]
Abstract
The expressed Ab repertoire is a critical determinant of immune-related phenotypes. Ab-encoding transcripts are distinct from other expressed genes because they are transcribed from somatically rearranged gene segments. Human Abs are composed of two identical H and L chain polypeptides derived from genes in IGH locus and one of two L chain loci. The combinatorial diversity that results from Ab gene rearrangement and the pairing of different H and L chains contributes to the immense diversity of the baseline Ab repertoire. During rearrangement, Ab gene selection is mediated by factors that influence chromatin architecture, promoter/enhancer activity, and V(D)J recombination. Interindividual variation in the composition of the Ab repertoire associates with germline variation in IGH, implicating polymorphism in Ab gene regulation. Determining how IGH variants directly mediate gene regulation will require integration of these variants with other functional genomic datasets. In this study, we argue that standard approaches using short reads have limited utility for characterizing regulatory regions in IGH at haplotype resolution. Using simulated and chromatin immunoprecipitation sequencing reads, we define features of IGH that limit use of short reads and a single reference genome, namely 1) the highly duplicated nature of the DNA sequence in IGH and 2) structural polymorphisms that are frequent in the population. We demonstrate that personalized diploid references enhance performance of short-read data for characterizing mappable portions of the locus, while also showing that long-read profiling tools will ultimately be needed to fully resolve functional impacts of IGH germline variation on expressed Ab repertoires.
Collapse
Affiliation(s)
- Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| |
Collapse
|
2
|
Engelbrecht E, Rodriguez OL, Shields K, Schultze S, Tieri D, Jana U, Yaari G, Lees WD, Smith ML, Watson CT. Resolving haplotype variation and complex genetic architecture in the human immunoglobulin kappa chain locus in individuals of diverse ancestry. Genes Immun 2024; 25:297-306. [PMID: 38844673 PMCID: PMC11327106 DOI: 10.1038/s41435-024-00279-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 05/21/2024] [Accepted: 05/24/2024] [Indexed: 08/17/2024]
Abstract
Immunoglobulins (IGs), critical components of the human immune system, are composed of heavy and light protein chains encoded at three genomic loci. The IG Kappa (IGK) chain locus consists of two large, inverted segmental duplications. The complexity of the IG loci has hindered use of standard high-throughput methods for characterizing genetic variation within these regions. To overcome these limitations, we use long-read sequencing to create haplotype-resolved IGK assemblies in an ancestrally diverse cohort (n = 36), representing the first comprehensive description of IGK haplotype variation. We identify extensive locus polymorphism, including novel single nucleotide variants (SNVs) and novel structural variants harboring functional IGKV genes. Among 47 functional IGKV genes, we identify 145 alleles, 67 of which were not previously curated. We report inter-population differences in allele frequencies for 10 IGKV genes, including alleles unique to specific populations within this dataset. We identify haplotypes carrying signatures of gene conversion that associate with SNV enrichment in the IGK distal region, and a haplotype with an inversion spanning the proximal and distal regions. These data provide a critical resource of curated genomic reference information from diverse ancestries, laying a foundation for advancing our understanding of population-level genetic variation in the IGK locus.
Collapse
Affiliation(s)
- Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Kaitlyn Shields
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Steven Schultze
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - David Tieri
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Uddalok Jana
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - William D Lees
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Melissa L Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA.
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, USA.
| |
Collapse
|
3
|
Peres A, Lees WD, Rodriguez OL, Lee NY, Polak P, Hope R, Kedmi M, Collins AM, Ohlin M, Kleinstein S, Watson C, Yaari G. IGHV allele similarity clustering improves genotype inference from adaptive immune receptor repertoire sequencing data. Nucleic Acids Res 2023; 51:e86. [PMID: 37548401 PMCID: PMC10484671 DOI: 10.1093/nar/gkad603] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 06/26/2023] [Accepted: 08/03/2023] [Indexed: 08/08/2023] Open
Abstract
In adaptive immune receptor repertoire analysis, determining the germline variable (V) allele associated with each T- and B-cell receptor sequence is a crucial step. This process is highly impacted by allele annotations. Aligning sequences, assigning them to specific germline alleles, and inferring individual genotypes are challenging when the repertoire is highly mutated, or sequence reads do not cover the whole V region. Here, we propose an alternative naming scheme for the V alleles, as well as a novel method to infer individual genotypes. We demonstrate the strengths of the two by comparing their outcomes to other genotype inference methods. We validate the genotype approach with independent genomic long-read data. The naming scheme is compatible with current annotation tools and pipelines. Analysis results can be converted from the proposed naming scheme to the nomenclature determined by the International Union of Immunological Societies (IUIS). Both the naming scheme and the genotype procedure are implemented in a freely available R package (PIgLET https://bitbucket.org/yaarilab/piglet). To allow researchers to further explore the approach on real data and to adapt it for their uses, we also created an interactive website (https://yaarilab.github.io/IGHV_reference_book).
Collapse
Affiliation(s)
- Ayelet Peres
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel
| | - William D Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, WC1E 7JE, UK
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Noah Y Lee
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel
| | - Ronen Hope
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
| | - Meirav Kedmi
- Department of Pathology, Yale School of Medicine, New Haven, CT, 06520, USA
- Division of Hematology and Bone Marrow Transplantation, Chaim Sheba Medical Center, Tel-Hashomer, 5262000, Israel
- Sackler School of Medicine, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | - Andrew M Collins
- School of Biotechnology and Biomedical Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Mats Ohlin
- Department of Immunotechnology Lund University, Lund, 221 00, Sweden
| | - Steven H Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, 40202, USA
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, 5290002 Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, 5290002 Ramat Gan, Israel
| |
Collapse
|
4
|
Rodriguez OL, Safonova Y, Silver CA, Shields K, Gibson WS, Kos JT, Tieri D, Ke H, Jackson KJL, Boyd SD, Smith ML, Marasco WA, Watson CT. Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire. Nat Commun 2023; 14:4419. [PMID: 37479682 PMCID: PMC10362067 DOI: 10.1038/s41467-023-40070-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/11/2023] [Indexed: 07/23/2023] Open
Abstract
Variation in the antibody response has been linked to differential outcomes in disease, and suboptimal vaccine and therapeutic responsiveness, the determinants of which have not been fully elucidated. Countering models that presume antibodies are generated largely by stochastic processes, we demonstrate that polymorphisms within the immunoglobulin heavy chain locus (IGH) impact the naive and antigen-experienced antibody repertoire, indicating that genetics predisposes individuals to mount qualitatively and quantitatively different antibody responses. We pair recently developed long-read genomic sequencing methods with antibody repertoire profiling to comprehensively resolve IGH genetic variation, including novel structural variants, single nucleotide variants, and genes and alleles. We show that IGH germline variants determine the presence and frequency of antibody genes in the expressed repertoire, including those enriched in functional elements linked to V(D)J recombination, and overlapping disease-associated variants. These results illuminate the power of leveraging IGH genetics to better understand the regulation, function, and dynamics of the antibody response in disease.
Collapse
Affiliation(s)
- Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Yana Safonova
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Catherine A Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Kaitlyn Shields
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - William S Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Justin T Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - David Tieri
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Hanzhong Ke
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | | | - Scott D Boyd
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Melissa L Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA.
| | - Wayne A Marasco
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA.
| |
Collapse
|
5
|
Ford EE, Tieri D, Rodriguez OL, Francoeur NJ, Soto J, Kos JT, Peres A, Gibson WS, Silver CA, Deikus G, Hudson E, Woolley CR, Beckmann N, Charney A, Mitchell TC, Yaari G, Sebra RP, Watson CT, Smith ML. FLAIRR-Seq: A Method for Single-Molecule Resolution of Near Full-Length Antibody H Chain Repertoires. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2023; 210:1607-1619. [PMID: 37027017 PMCID: PMC10152037 DOI: 10.4049/jimmunol.2200825] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 03/14/2023] [Indexed: 04/08/2023]
Abstract
Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5' RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5' RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.
Collapse
Affiliation(s)
- Easton E. Ford
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - David Tieri
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Oscar L. Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Nancy J. Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Juan Soto
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Catherine A. Silver
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Elizabeth Hudson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Cassandra R. Woolley
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
| | - Noam Beckmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Alexander Charney
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Thomas C. Mitchell
- Department of Microbiology and Immunology, University of Louisville School of Medicine, Louisville, KY
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
- Bar Ilan Institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, Israel
| | - Robert P. Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| | - Melissa L. Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY
| |
Collapse
|
6
|
Pushparaj P, Nicoletto A, Castro Dopico X, Sheward DJ, Kim S, Ekström S, Murrell B, Corcoran M, Karlsson Hedestam GB. Frequent use of IGHV3-30-3 in SARS-CoV-2 neutralizing antibody responses. FRONTIERS IN VIROLOGY 2023; 3:1128253. [PMID: 37041983 PMCID: PMC7614418 DOI: 10.3389/fviro.2023.1128253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
The antibody response to SARS-CoV-2 shows biased immunoglobulin heavy chain variable (IGHV) gene usage, allowing definition of genetic signatures for some classes of neutralizing antibodies. We investigated IGHV gene usage frequencies by sorting spike-specific single memory B cells from individuals infected with SARS-CoV-2 early in the pandemic. From two study participants and 703 spike-specific B cells, the most used genes were IGHV1-69, IGHV3-30-3, and IGHV3-30. Here, we focused on the IGHV3-30 group of genes and an IGHV3-30-3-using ultrapotent neutralizing monoclonal antibody, CAB-F52, which displayed broad neutralizing activity also in its germline-reverted form. IGHV3-30-3 is encoded by a region of the IGH locus that is highly variable at both the allelic and structural levels. Using personalized IG genotyping, we found that 4 of 14 study participants lacked the IGHV3-30-3 gene on both chromosomes, raising the question if other, highly similar IGHV genes could substitute for IGHV3-30-3 in persons lacking this gene. In the context of CAB-F52, we found that none of the tested IGHV3-33 alleles, but several IGHV3-30 alleles could substitute for IGHV3-30-3, suggesting functional redundancy between the highly homologous IGHV3-30 and IGHV3-30-3 genes for this antibody.
Collapse
Affiliation(s)
- Pradeepa Pushparaj
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Andrea Nicoletto
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Xaquin Castro Dopico
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Daniel J. Sheward
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Sungyong Kim
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Simon Ekström
- Department of Biomedical Engineering, Lund University, Lund, Sweden
| | - Ben Murrell
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Gunilla B. Karlsson Hedestam
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- CORRESPONDENCE Gunilla B. Karlsson Hedestam
| |
Collapse
|
7
|
Hardt U, Corcoran MM, Narang S, Malmström V, Padyukov L, Karlsson Hedestam GB. Analysis of IGH allele content in a sample group of rheumatoid arthritis patients demonstrates unrevealed population heterogeneity. Front Immunol 2023; 14:1073414. [PMID: 36798124 PMCID: PMC9927645 DOI: 10.3389/fimmu.2023.1073414] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 01/09/2023] [Indexed: 02/04/2023] Open
Abstract
Immunoglobulin heavy chain (IGH) germline gene variations influence the B cell receptor repertoire, with resulting biological consequences such as shaping our response to infections and altering disease susceptibilities. However, the lack of information on polymorphism frequencies in the IGH loci at the population level makes association studies challenging. Here, we genotyped a pilot group of 30 individuals with rheumatoid arthritis (RA) to examine IGH allele content and frequencies in this group. Eight novel IGHV alleles and one novel IGHJ allele were identified in the study. 15 cases were haplotypable using heterozygous IGHJ6 or IGHD anchors. One variant, IGHV4-34*01_S0742, was found in three out of 30 cases and included a single nucleotide change resulting in a non-canonical recombination signal sequence (RSS) heptamer. This variant allele, shown by haplotype analysis to be non-expressed, was also found in three out of 30 healthy controls and matched a single nucleotide polymorphism (SNP) described in the 1000 Genomes Project (1KGP) collection with frequencies that varied between population groups. Our finding of previously unreported alleles in a relatively small group of individuals with RA illustrates the need for baseline information about IG allelic frequencies in targeted study groups in preparation for future analysis of these genes in disease association studies.
Collapse
Affiliation(s)
- Uta Hardt
- Division of Rheumatology, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden and Karolinska University Hospital, Stockholm, Sweden
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Martin M. Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Sanjana Narang
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Vivianne Malmström
- Division of Rheumatology, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden and Karolinska University Hospital, Stockholm, Sweden
| | - Leonid Padyukov
- Division of Rheumatology, Department of Medicine Solna, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden and Karolinska University Hospital, Stockholm, Sweden
| | | |
Collapse
|
8
|
Pennell M, Rodriguez OL, Watson CT, Greiff V. The evolutionary and functional significance of germline immunoglobulin gene variation. Trends Immunol 2023; 44:7-21. [PMID: 36470826 DOI: 10.1016/j.it.2022.11.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/04/2022]
Abstract
The recombination between immunoglobulin (IG) gene segments determines an individual's naïve antibody repertoire and, consequently, (auto)antigen recognition. Emerging evidence suggests that mammalian IG germline variation impacts humoral immune responses associated with vaccination, infection, and autoimmunity - from the molecular level of epitope specificity, up to profound changes in the architecture of antibody repertoires. These links between IG germline variants and immunophenotype raise the question on the evolutionary causes and consequences of diversity within IG loci. We discuss why the extreme diversity in IG loci remains a mystery, why resolving this is important for the design of more effective vaccines and therapeutics, and how recent evidence from multiple lines of inquiry may help us do so.
Collapse
Affiliation(s)
- Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
9
|
Collins AM, Watson CT, Breden F. Immunoglobulin genes, reproductive isolation and vertebrate speciation. Immunol Cell Biol 2022; 100:497-506. [PMID: 35781330 PMCID: PMC9545137 DOI: 10.1111/imcb.12567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/19/2022] [Accepted: 06/21/2022] [Indexed: 12/15/2022]
Abstract
Reproductive isolation drives the formation of new species, and many genes contribute to this through Dobzhansky–Muller incompatibilities (DMIs). These incompatibilities occur when gene divergence affects loci encoding interacting products such as receptors and their ligands. We suggest here that the nature of vertebrate immunoglobulin (IG) genes must make them prone to DMIs. The genes of these complex loci form functional genes through the process of recombination, giving rise to a repertoire of heterodimeric receptors of incredible diversity. This repertoire, within individuals and within species, must defend against pathogens but must also avoid pathogenic self‐reactivity. We suggest that this avoidance of autoimmunity is only achieved through a coordination of evolution between heavy‐ and light‐chain genes, and between these genes and the rest of the genome. Without coordinated evolution, the hybrid offspring of two diverging populations will carry a heavy burden of DMIs, resulting in a loss of fitness. Critical incompatibilities could manifest as incompatibilities between a mother and her divergent offspring. During fetal development, biochemical differences between the parents of hybrid offspring could make their offspring a target of the maternal immune system. This hypothesis was conceived in the light of recent insights into the population genetics of IG genes. This has suggested that antibody genes are probably as susceptible to evolutionary forces as other parts of the genome. Further repertoire studies in human and nonhuman species should now help determine whether antibody genes have been part of the evolutionary forces that drive the development of species.
Collapse
Affiliation(s)
- Andrew M Collins
- School of Biotechnology and Biomolecular Sciences University of New South Wales Sydney NSW Australia
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics University of Louisville School of Medicine Louisville KY USA
| | - Felix Breden
- Department of Biological Sciences Simon Fraser University Burnaby BC Canada
| |
Collapse
|
10
|
Jackson KJL, Kos JT, Lees W, Gibson WS, Smith ML, Peres A, Yaari G, Corcoran M, Busse CE, Ohlin M, Watson CT, Collins AM. A BALB/c IGHV Reference Set, Defined by Haplotype Analysis of Long-Read VDJ-C Sequences From F1 (BALB/c x C57BL/6) Mice. Front Immunol 2022; 13:888555. [PMID: 35720344 PMCID: PMC9205180 DOI: 10.3389/fimmu.2022.888555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/28/2022] [Indexed: 11/13/2022] Open
Abstract
The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that current MGP assemblies are unsuitable for the comprehensive documentation of germline IGHVs and more efforts will be needed to establish strain-specific reference sets.
Collapse
Affiliation(s)
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - William S. Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Melissa Laird Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
11
|
Open label safety and efficacy pilot to study mitigation of equine recurrent uveitis through topical suppressor of cytokine signaling-1 mimetic peptide. Sci Rep 2022; 12:7177. [PMID: 35505065 PMCID: PMC9065145 DOI: 10.1038/s41598-022-11338-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 04/19/2022] [Indexed: 02/07/2023] Open
Abstract
Equine recurrent uveitis (ERU) is a painful and debilitating autoimmune disease and represents the only spontaneous model of human recurrent uveitis (RU). Despite the efficacy of existing treatments, RU remains a leading cause of visual handicap in horses and humans. Cytokines, which utilize Janus kinase 2 (Jak2) for signaling, drive the inflammatory processes in ERU that promote blindness. Notably, suppressor of cytokine signaling 1 (SOCS1), which naturally limits the activation of Jak2 through binding interactions, is often deficient in autoimmune disease patients. Significantly, we previously showed that topical administration of a SOCS1 peptide mimic (SOCS1-KIR) mitigated induced rodent uveitis. In this pilot study, we test the potential to translate the therapeutic efficacy observed in experimental rodent uveitis to equine patient disease. Through bioinformatics and peptide binding assays we demonstrate putative binding of the SOCS1-KIR peptide to equine Jak2. We also show that topical, or intravitreal injection of SOCS1-KIR was well tolerated within the equine eye through physical and ophthalmic examinations. Finally, we show that topical SOCS1-KIR administration was associated with significant clinical ERU improvement. Together, these results provide a scientific rationale, and supporting experimental evidence for the therapeutic use of a SOCS1 mimetic peptide in RU.
Collapse
|
12
|
Kaduk M, Corcoran M, Karlsson Hedestam GB. Addressing IGHV Gene Structural Diversity Enhances Immunoglobulin Repertoire Analysis: Lessons From Rhesus Macaque. Front Immunol 2022; 13:818440. [PMID: 35419009 PMCID: PMC8995469 DOI: 10.3389/fimmu.2022.818440] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 03/01/2022] [Indexed: 11/13/2022] Open
Abstract
The accurate germline gene assignment and assessment of somatic hypermutation in antibodies induced by immunization or infection are important in immunological studies. Here, we illustrate issues specific to the construction of comprehensive immunoglobulin (IG) germline gene reference databases for outbred animal species using rhesus macaques, a frequently used non-human primate model, as a model test case. We demonstrate that the genotypic variation found in macaque germline inference studies is reflected in similar levels of gene diversity in genomic assemblies. We show that the high frequency of IG heavy chain V (IGHV) region structural and gene copy number variation between subjects means that individual animals lack genes that are present in other animals. Therefore, gene databases compiled from a single or too few animals will inevitably result in inaccurate gene assignment and erroneous SHM level assessment for those genes it lacks. We demonstrate this by assigning a test macaque IgG library to the KIMDB, a database compiled of germline IGHV sequences from 27 rhesus macaques, and, alternatively, to the IMGT rhesus macaque database, based on IGHV genes inferred primarily from the genomic sequence of the rheMac10 reference assembly, supplemented with 10 genes from the Mmul_051212 assembly. We found that the use of a gene-restricted database led to overestimations of SHM by up to 5% due to misassignments. The principles described in the current study provide a model for the creation of comprehensive immunoglobulin reference databases from outbred species to ensure accurate gene assignment, lineage tracing and SHM calculations.
Collapse
Affiliation(s)
- Mateusz Kaduk
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| | | |
Collapse
|
13
|
Omer A, Peres A, Rodriguez OL, Watson CT, Lees W, Polak P, Collins AM, Yaari G. T cell receptor beta germline variability is revealed by inference from repertoire data. Genome Med 2022; 14:2. [PMID: 34991709 PMCID: PMC8740489 DOI: 10.1186/s13073-021-01008-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 12/08/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.
Collapse
Affiliation(s)
- Aviv Omer
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Andrew M Collins
- School of Biotechnology and Biomedical Sciences, University of New South Wales, Sydney, Australia
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel.
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel.
| |
Collapse
|
14
|
Kenter AL, Watson CT, Spille JH. Igh Locus Polymorphism May Dictate Topological Chromatin Conformation and V Gene Usage in the Ig Repertoire. Front Immunol 2021; 12:682589. [PMID: 34084176 PMCID: PMC8167033 DOI: 10.3389/fimmu.2021.682589] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 04/26/2021] [Indexed: 01/08/2023] Open
Abstract
Vast repertoires of unique antigen receptors are created in developing B and T lymphocytes. The antigen receptor loci contain many variable (V), diversity (D) and joining (J) gene segments that are arrayed across very large genomic expanses and are joined to form variable-region exons of expressed immunoglobulins and T cell receptors. This process creates the potential for an organism to respond to large numbers of different pathogens. Here, we consider the possibility that genetic polymorphisms with alterations in a vast array of regulatory elements in the immunoglobulin heavy chain (IgH) locus lead to changes in locus topology and impact immune-repertoire formation.
Collapse
Affiliation(s)
- Amy L. Kenter
- Department of Microbiology and Immunology, University of Illinois College of Medicine, Chicago, IL, United States
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Jan-Hendrik Spille
- Department of Physics, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
15
|
Ohlin M. Poorly Expressed Alleles of Several Human Immunoglobulin Heavy Chain Variable Genes are Common in the Human Population. Front Immunol 2021; 11:603980. [PMID: 33717051 PMCID: PMC7943739 DOI: 10.3389/fimmu.2020.603980] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 12/08/2020] [Indexed: 12/23/2022] Open
Abstract
Extensive diversity has been identified in the human heavy chain immunoglobulin locus, including allelic variation, gene duplication, and insertion/deletion events. Several genes have been suggested to be deleted in many haplotypes. Such findings have commonly been based on inference of the germline repertoire from data sets covering antibody heavy chain encoding transcripts. The inference process operates under conditions that may limit identification of genes transcribed at low levels. The presence of rare transcripts that would indicate the existence of poorly expressed alleles in haplotypes that otherwise appear to have deleted these genes has been assessed in the present study. Alleles IGHV1-2*05, IGHV1-3*02, IGHV4-4*01, and IGHV7-4-1*01 were all identified as being expressed from multiple haplotypes, but only at low levels, haplotypes that by inference often appeared not to express these genes at all. These genes are thus not as commonly deleted as previously thought. An assessment of the 5' untranslated region (up to and including the TATA-box), the signal peptide-encoding part of the gene, and the 3'-heptamer suggests that the alleles have no or minimal sequence difference in these regions in comparison to highly expressed alleles. This suggest that they may be able to participate in immunoglobulin gene rearrangement, transcription and translation. However, all four poorly expressed alleles harbor unusual sequence variants within their coding region that may compromise the functionality of the encoded products, thereby limiting their incorporation into the immunoglobulin repertoire. Transcripts based on IGHV7-4-1*01 that had undergone somatic hypermutation and class switch had mutated the codon that encoded the unusual residue in framework region 3 (cysteine 92; located far from the antigen binding site). This finding further supports the poor compatibility of this unusual residue in a fully functional protein product. Indications of a linkage disequilibrium were identified as IGHV1-2*05 and IGHV4-4*01 co-localized to the same haplotypes. Furthermore, transcripts of two of the poorly expressed alleles (IGHV1-3*02 and IGHV4-4*01) mostly do not encode in-frame, functional products, suggesting that these alleles might be essentially non-functional. It is proposed that the functionality status of immunoglobulin genes should also include assessment of their ability to encode functional protein products.
Collapse
Affiliation(s)
- Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| |
Collapse
|
16
|
Rhesus and cynomolgus macaque immunoglobulin heavy-chain genotyping yields comprehensive databases of germline VDJ alleles. Immunity 2021; 54:355-366.e4. [PMID: 33484642 DOI: 10.1016/j.immuni.2020.12.018] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 10/19/2020] [Accepted: 12/30/2020] [Indexed: 12/20/2022]
Abstract
Definition of the specific germline immunoglobulin (Ig) alleles present in an individual is a critical first step to delineate the ontogeny and evolution of antigen-specific antibody responses. Rhesus and cynomolgus macaques are important animal models for pre-clinical studies, with four main sub-groups being used: Indian- and Chinese-origin rhesus macaques and Mauritian and Indonesian cynomolgus macaques. We applied the (Ig) gene inference tool IgDiscover and performed extensive Sanger sequencing-based genomic validation to define germline VDJ alleles in these 4 sub-groups, comprising 45 macaques in total. There was allelic overlap between Chinese- and Indian-origin rhesus macaques and also between the two macaque species, which is consistent with substantial admixture. The island-restricted Mauritian cynomolgus population displayed the lowest number of alleles of the sub-groups, yet maintained high individual allelic diversity. These comprehensive databases of germline IGH alleles for rhesus and cynomolgus macaques provide a resource toward the study of B cell responses in these important pre-clinical models.
Collapse
|
17
|
Transgenic Animals for the Generation of Human Antibodies. LEARNING MATERIALS IN BIOSCIENCES 2021. [DOI: 10.1007/978-3-030-54630-4_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
18
|
Collins AM, Yaari G, Shepherd AJ, Lees W, Watson CT. Germline immunoglobulin genes: Disease susceptibility genes hidden in plain sight? CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 24:100-108. [PMID: 37008538 PMCID: PMC10062056 DOI: 10.1016/j.coisb.2020.10.011] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Immunoglobulin genes are rarely considered as disease susceptibility genes despite their obvious and central contributions to immune function. This appears to be a consequence of historical views on antibody repertoire formation that no longer stand, and of difficulties that until recently surrounded the documentation of the suite of antibody genes in any individual. If these important genes are to be accessible to GWAS studies, allelic variation within the human population needs to be better documented, and a curated set of genomic variations associated with antibody genes needs to be formulated. Repertoire studies arising from the COVID-19 pandemic provide an opportunity to meet these needs, and may provide insights into the profound variability that is seen in outcomes to this infection.
Collapse
|
19
|
Rodriguez OL, Gibson WS, Parks T, Emery M, Powell J, Strahl M, Deikus G, Auckland K, Eichler EE, Marasco WA, Sebra R, Sharp AJ, Smith ML, Bashir A, Watson CT. A Novel Framework for Characterizing Genomic Haplotype Diversity in the Human Immunoglobulin Heavy Chain Locus. Front Immunol 2020; 11:2136. [PMID: 33072076 PMCID: PMC7539625 DOI: 10.3389/fimmu.2020.02136] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 08/06/2020] [Indexed: 02/06/2023] Open
Abstract
An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics.
Collapse
Affiliation(s)
- Oscar L Rodriguez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - William S Gibson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Matthew Emery
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - James Powell
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Maya Strahl
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Kathryn Auckland
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, United States.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, United States
| | - Wayne A Marasco
- Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute, Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Icahn Institute of Data Science and Genomic Technology, New York, NY, United States
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Melissa L Smith
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.,Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States.,Icahn Institute of Data Science and Genomic Technology, New York, NY, United States
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| |
Collapse
|
20
|
Mikocziova I, Gidoni M, Lindeman I, Peres A, Snir O, Yaari G, Sollid LM. Polymorphisms in human immunoglobulin heavy chain variable genes and their upstream regions. Nucleic Acids Res 2020; 48:5499-5510. [PMID: 32365177 PMCID: PMC7261178 DOI: 10.1093/nar/gkaa310] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 04/20/2020] [Indexed: 01/13/2023] Open
Abstract
Germline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 potential novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-REGION in the 5′UTR, L-PART1 and L-PART2 sequences, and found that identical V-REGION alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-REGION but also in the upstream sequences of IGHV genes. Our findings provide a new perspective for annotating immunoglobulin repertoire sequencing data.
Collapse
Affiliation(s)
- Ivana Mikocziova
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Ida Lindeman
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Omri Snir
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Ludvig M Sollid
- K.G.Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372 Oslo, Norway
| |
Collapse
|
21
|
Peres A, Gidoni M, Polak P, Yaari G. RAbHIT: R Antibody Haplotype Inference Tool. Bioinformatics 2020; 35:4840-4842. [PMID: 31173062 DOI: 10.1093/bioinformatics/btz481] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 05/11/2019] [Accepted: 06/04/2019] [Indexed: 12/11/2022] Open
Abstract
SUMMARY Antibody haplotype inference (chromosomal phasing) may have clinical implications for the identification of genetic predispositions to diseases. Yet, our knowledge of the genomic loci encoding for the variable regions of the antibody is only partial, mostly due to the challenge of aligning short reads from genome sequencing to these highly repetitive loci. A powerful approach to infer the content of these loci relies on analyzing repertoires of rearranged V(D)J sequences. We present here RAbHIT, an R Haplotype Antibody Inference Tool, that implements a novel algorithm to infer V(D)J haplotypes by adapting a Bayesian framework. RAbHIT offers inference of haplotype and gene deletions. It may be applied to sequences from naïve and non-naïve B-cells, sequenced by different library preparation protocols. AVAILABILITY AND IMPLEMENTATION RAbHIT is freely available for academic use from comprehensive R archive network (CRAN) (https://cran.r-project.org/web/packages/rabhit/) under CC BY-SA 4.0 license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
22
|
Lees W, Busse CE, Corcoran M, Ohlin M, Scheepers C, Matsen FA, Yaari G, Watson CT, Collins A, Shepherd AJ. OGRDB: a reference database of inferred immune receptor genes. Nucleic Acids Res 2020; 48:D964-D970. [PMID: 31566225 PMCID: PMC6943078 DOI: 10.1093/nar/gkz822] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/05/2019] [Accepted: 09/16/2019] [Indexed: 12/20/2022] Open
Abstract
High-throughput sequencing of the adaptive immune receptor repertoire (AIRR-seq) is providing unprecedented insights into the immune response to disease and into the development of immune disorders. The accurate interpretation of AIRR-seq data depends on the existence of comprehensive germline gene reference sets. Current sets are known to be incomplete and unrepresentative of the degree of polymorphism and diversity in human and animal populations. A key issue is the complexity of the genomic regions in which they lie, which, because of the presence of multiple repeats, insertions and deletions, have not proved tractable with short-read whole genome sequencing. Recently, tools and methods for inferring such gene sequences from AIRR-seq datasets have become available, and a community approach has been developed for the expert review and publication of such inferences. Here, we present OGRDB, the Open Germline Receptor Database (https://ogrdb.airr-community.org), a public resource for the submission, review and publication of previously unknown receptor germline sequences together with supporting evidence.
Collapse
Affiliation(s)
- William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| | - Christian E Busse
- Division of B Cell Immunology, German Cancer Research Center, 69120 Heidelberg, Germany
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Box 280, 171 77 Stockholm, Sweden
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Medicon Village, S-223 81 Lund, Sweden
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases of the National Health Laboratory Service, Sandringam, Gauteng 2131, South Africa.,Antibody Immunity Research Unit, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Frederick A Matsen
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY 40202, USA
| | | | - Andrew Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Adrian J Shepherd
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, UK
| |
Collapse
|
23
|
Inter- and intraspecies comparison of phylogenetic fingerprints and sequence diversity of immunoglobulin variable genes. Immunogenetics 2020; 72:279-294. [PMID: 32367185 DOI: 10.1007/s00251-020-01164-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/13/2020] [Indexed: 10/24/2022]
Abstract
Protection and neutralization of a vast array of pathogens is accomplished by the tremendous diversity of the B cell receptor (BCR) repertoire. For jawed vertebrates, this diversity is initiated via the somatic recombination of immunoglobulin (Ig) germline elements. While it is clear that the number of these germline segments differs from species to species, the extent of cross-species sequence diversity remains largely uncharacterized. Here we use extensive computational and statistical methods to investigate the sequence diversity and evolutionary relationship between Ig variable (V), diversity (D), and joining (J) germline segments across nine commonly studied species ranging from zebrafish to human. Metrics such as guanine-cytosine (GC) content showed low redundancy across Ig germline genes within a given species. Other comparisons, including amino acid motifs, evolutionary selection, and sequence diversity, revealed species-specific properties. Additionally, we showed that the germline-encoded diversity differs across antibody (recombined V-D-J) repertoires of various B cell subsets. To facilitate future comparative immunogenomics analysis, we created VDJgermlines, an R package that contains the germline sequences from multiple species. Our study informs strategies for the humanization and engineering of therapeutic antibodies.
Collapse
|
24
|
Dynamics of heavy chain junctional length biases in antibody repertoires. Commun Biol 2020; 3:207. [PMID: 32358517 PMCID: PMC7195405 DOI: 10.1038/s42003-020-0931-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 04/01/2020] [Indexed: 11/21/2022] Open
Abstract
Antibody variable domain sequence diversity is generated by recombination of germline segments. The third complementarity-determining region of the heavy chain (CDR H3) is the region of highest sequence diversity and is formed by the joining of heavy chain VH, DH and JH germline segments combined with random nucleotide trimming and additions between these segments. We show that CDR H3 and junctional segment length distributions are biased in human antibody repertoires as a function of VH, VL and JH germline segment utilization. Most length biases are apparent in the naive and antigen experienced B cell compartments but not in nonproductive recombination products, indicating B cell selection as a major driver of these biases. Our findings reveal biases in the antibody CDR H3 diversity landscape shaped by VH, VL, and JH germline segment use during naive and antigen-experienced repertoire selection. Sankar et al. investigate the junctional length biases (determining antibody binding potential) as a function of germline gene usage in antibody repertoires. They show that CDR H3 and junction length are biased by VH, VL, and JH germline segment usage and these biases are apparent in both naive and antigen-experienced repertoires but not in non-productive repertoires.
Collapse
|
25
|
Bhardwaj V, Franceschetti M, Rao R, Pevzner PA, Safonova Y. Automated analysis of immunosequencing datasets reveals novel immunoglobulin D genes across diverse species. PLoS Comput Biol 2020; 16:e1007837. [PMID: 32339161 PMCID: PMC7295240 DOI: 10.1371/journal.pcbi.1007837] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 06/15/2020] [Accepted: 04/01/2020] [Indexed: 12/30/2022] Open
Abstract
Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes. Antibodies provide specific binding to an enormous range of antigens and represent a key component of the adaptive immune system. Immunosequencing has emerged as a method of choice for generating millions of reads that sample antibody repertoires and provides insights into monitoring immune response to disease and vaccination. Most of the previous immunogenomics studies rely on the reference germline genes in the immunoglobulin locus rather than the germline genes in a specific patient. This approach is deficient since the set of known germline genes is incomplete (particularly for non-European humans and non-human species) and contains alleles that resulted from sequencing and annotation errors. The problem of de novo inference of diversity (D) genes from immunosequencing data remained open until the IgScout algorithm was developed in 2019. We address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction and infer multiple D genes across multiple species that are not present in standard databases.
Collapse
Affiliation(s)
- Vinnu Bhardwaj
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Massimo Franceschetti
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
| | - Ramesh Rao
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, California, United States of America
- Qualcomm Institute, University of California San Diego, San Diego, California, United States of America
| | - Pavel A. Pevzner
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- * E-mail:
| | - Yana Safonova
- Computer Science and Engineering Department, University of California San Diego, San Diego, California, United States of America
- Center for Information Theory and Applications, University of California San Diego, San Diego, California, United States of America
| |
Collapse
|
26
|
Ford M, Haghshenas E, Watson CT, Sahinalp SC. Genotyping and Copy Number Analysis of Immunoglobin Heavy Chain Variable Genes Using Long Reads. iScience 2020; 23:100883. [PMID: 32109676 PMCID: PMC7044747 DOI: 10.1016/j.isci.2020.100883] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 11/08/2019] [Accepted: 01/29/2020] [Indexed: 11/22/2022] Open
Abstract
One of the remaining challenges to describing an individual's genetic variation lies in the highly heterogeneous and complex genomic regions that impede the use of classical reference-guided mapping and assembly approaches. Once such region is the Immunoglobulin heavy chain locus (IGH), which is critical for the development of antibodies and the adaptive immune system. We describe ImmunoTyper, the first PacBio-based genotyping and copy number calling tool specifically designed for IGH V genes (IGHV). We demonstrate that ImmunoTyper's multi-stage clustering and combinatorial optimization approach represents the most comprehensive IGHV genotyping approach published to date, through validation using gold-standard IGH reference sequence. This preliminary work establishes the feasibility of fine-grained genotype and copy number analysis using error-prone long reads in complex multi-gene loci and opens the door for in-depth investigation into IGHV heterogeneity using accessible and increasingly common whole-genome sequence.
Collapse
Affiliation(s)
- Michael Ford
- School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby V5A 1S6, Canada
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville 40292, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, Bethesda, MD 20892, USA.
| |
Collapse
|
27
|
Omer A, Shemesh O, Peres A, Polak P, Shepherd AJ, Watson C, Boyd SD, Collins AM, Lees W, Yaari G. VDJbase: an adaptive immune receptor genotype and haplotype database. Nucleic Acids Res 2020; 48:D1051-D1056. [PMID: 31602484 PMCID: PMC6943044 DOI: 10.1093/nar/gkz872] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/19/2019] [Accepted: 10/01/2019] [Indexed: 12/14/2022] Open
Abstract
VDJbase is a publicly available database that offers easy searching of data describing the complete sets of gene sequences (genotypes and haplotypes) inferred from adaptive immune receptor repertoire sequencing datasets. VDJbase is designed to act as a resource that will allow the scientific community to explore the genetic variability of the immunoglobulin (Ig) and T cell receptor (TR) gene loci. It can also assist in the investigation of Ig- and TR-related genetic predispositions to diseases. Our database includes web-based query and online tools to assist in visualization and analysis of the genotype and haplotype data. It enables users to detect those alleles and genes that are significantly over-represented in a particular population, in terms of genotype, haplotype and gene expression. The database website can be freely accessed at https://www.vdjbase.org/, and no login is required. The data and code use creative common licenses and are freely downloadable from https://bitbucket.org/account/user/yaarilab/projects/GPHP.
Collapse
Affiliation(s)
- Aviv Omer
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Or Shemesh
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Ayelet Peres
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Pazit Polak
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Adrian J Shepherd
- Institute of Structural and Molecular Biology, Birkbeck, University of London, London, UK
| | - Corey T Watson
- University of Louisville School of Medicine, Biochemistry and Molecular Genetics, Louisville, KY 40292, USA
| | - Scott D Boyd
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of NSW, Kensington, Sydney, NSW 2052, Australia
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck, University of London, London, UK
| | - Gur Yaari
- Bioengineering, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
28
|
Ralph DK, Matsen FA. Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data. PLoS Comput Biol 2019; 15:e1007133. [PMID: 31329576 PMCID: PMC6675132 DOI: 10.1371/journal.pcbi.1007133] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 08/01/2019] [Accepted: 05/28/2019] [Indexed: 11/26/2022] Open
Abstract
The collection of immunoglobulin genes in an individual's germline, which gives rise to B cell receptors via recombination, is known to vary significantly across individuals. In humans, for example, each individual has only a fraction of the several hundred known V alleles. Furthermore, the currently-accepted set of known V alleles is both incomplete (particularly for non-European samples), and contains a significant number of spurious alleles. The resulting uncertainty as to which immunoglobulin alleles are present in any given sample results in inaccurate B cell receptor sequence annotations, and in particular inaccurate inferred naive ancestors. In this paper we first show that the currently widespread practice of aligning each sequence to its closest match in the full set of IMGT alleles results in a very large number of spurious alleles that are not in the sample's true set of germline V alleles. We then describe a new method for inferring each individual's germline gene set from deep sequencing data, and show that it improves upon existing methods by making a detailed comparison on a variety of simulated and real data samples. This new method has been integrated into the partis annotation and clonal family inference package, available at https://github.com/psathyrella/partis, and is run by default without affecting overall run time.
Collapse
Affiliation(s)
- Duncan K. Ralph
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Frederick A. Matsen
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|
29
|
Safonova Y, Pevzner PA. De novo Inference of Diversity Genes and Analysis of Non-canonical V(DD)J Recombination in Immunoglobulins. Front Immunol 2019; 10:987. [PMID: 31134072 PMCID: PMC6516046 DOI: 10.3389/fimmu.2019.00987] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 04/16/2019] [Indexed: 12/03/2022] Open
Abstract
The V(D)J recombination forms the immunoglobulin genes by joining the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics aims at finding alleles of germline genes across various patients. Although recent studies described algorithms for de novo inference of V and J genes from immunosequencing data, they stopped short of solving a more difficult problem of reconstructing D genes that form the highly divergent CDR3 regions and provide the most important contribution to the antigen binding. We present the IgScout algorithm for de novo D gene reconstruction and apply it to reveal new alleles of human D genes and previously unknown D genes in camel, an important model organism in immunology. We further analyze non-canonical V(DD)J recombination that results in unusually long CDR3s with tandem fused IGHD genes and thus expands the diversity of the antibody repertoires. We demonstrate that tandem CDR3s represent a consistent and functional feature of all analyzed immunosequencing datasets, reveal ultra-long CDR3s, and shed light on the mechanism responsible for their formation.
Collapse
Affiliation(s)
- Yana Safonova
- Center for Information Theory and Applications, University of California, San Diego, San Diego, CA, United States
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, United States
| |
Collapse
|
30
|
Nielsen SCA, Boyd SD. Human adaptive immune receptor repertoire analysis-Past, present, and future. Immunol Rev 2019; 284:9-23. [PMID: 29944765 DOI: 10.1111/imr.12667] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The genes encoding adaptive immune antigen receptors, namely the immunoglobulins expressed in membrane-bound or secreted forms by B cells, and the cell surface T cell receptors, are unique in human biology because they are generated by combinatorial rearrangement of the genomic DNA. The diversity of receptors so generated in populations of lymphocytes enables the human immune system to recognize antigens expressed by pathogens, but also underlies the pathological specificity of autoimmune diseases and the mistargeted immunity in allergies. Several recent technological developments, foremost among them the invention of high-throughput DNA sequencing instruments, have enabled much deeper and thorough evaluation of clones of human B cells and T cells and the antigen receptors they express during physiological and pathogenic immune responses. The evolutionary struggles between host adaptive immune responses and populations of pathogens are now open to greater scrutiny, elucidation of the underlying reasons for successful or failed immunity, and potential predictive modeling, than ever before. Here we give an overview of the foundations, recent progress, and future prospects in this dynamic area of research.
Collapse
Affiliation(s)
| | - Scott D Boyd
- Department of Pathology, Stanford University, Stanford, CA, USA
| |
Collapse
|
31
|
Ohlin M, Scheepers C, Corcoran M, Lees WD, Busse CE, Bagnara D, Thörnqvist L, Bürckert JP, Jackson KJL, Ralph D, Schramm CA, Marthandan N, Breden F, Scott J, Matsen IV FA, Greiff V, Yaari G, Kleinstein SH, Christley S, Sherkow JS, Kossida S, Lefranc MP, van Zelm MC, Watson CT, Collins AM. Inferred Allelic Variants of Immunoglobulin Receptor Genes: A System for Their Evaluation, Documentation, and Naming. Front Immunol 2019; 10:435. [PMID: 30936866 PMCID: PMC6431624 DOI: 10.3389/fimmu.2019.00435] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 02/19/2019] [Indexed: 11/13/2022] Open
Abstract
Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC-focusing, to begin with, on human IGHV genes-with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species.
Collapse
Affiliation(s)
- Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
| | - Cathrine Scheepers
- Center for HIV and STIs, National Institute for Communicable Diseases, Johannesburg, South Africa
- Faculty of Health Sciences, School of Pathology, University of the Witwatersrand, Johannesburg, South Africa
| | - Martin Corcoran
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
| | - William D. Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Christian E. Busse
- Division of B Cell Immunology, German Cancer Research Center, Heidelberg, Germany
| | - Davide Bagnara
- Department of Experimental Medicine, University of Genoa, Genoa, Italy
| | | | | | | | - Duncan Ralph
- Fred Hutchinson Cancer Research Center, Seattle, WA, United States
| | - Chaim A. Schramm
- Vaccine Research Center, National Institutes of Health, Washington, DC, United States
| | - Nishanth Marthandan
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada
| | - Jamie Scott
- Department of Molecular Biology and Biochemistry, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | - Victor Greiff
- Department of Immunology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel
| | | | - Scott Christley
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Jacob S. Sherkow
- Innovation Center for Law and Technology, New York Law School, New York, NY, United States
| | - Sofia Kossida
- IMGT, The International ImMunoGenetics information system (IMGT), Laboratoire d'ImmunoGénétique Moléculaire (LIGM), CNRS, Institut de Génétique Humaine, Université de Montpellier, Montpellier, France
| | - Marie-Paule Lefranc
- IMGT, The International ImMunoGenetics information system (IMGT), Laboratoire d'ImmunoGénétique Moléculaire (LIGM), CNRS, Institut de Génétique Humaine, Université de Montpellier, Montpellier, France
| | - Menno C. van Zelm
- Department of Immunology and Pathology, Central Clinical School, The Alfred Hospital, Monash University, Melbourne, VIC, Australia
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, United States
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
32
|
Gadala-Maria D, Gidoni M, Marquez S, Vander Heiden JA, Kos JT, Watson CT, O'Connor KC, Yaari G, Kleinstein SH. Identification of Subject-Specific Immunoglobulin Alleles From Expressed Repertoire Sequencing Data. Front Immunol 2019; 10:129. [PMID: 30814994 PMCID: PMC6381938 DOI: 10.3389/fimmu.2019.00129] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 01/16/2019] [Indexed: 01/10/2023] Open
Abstract
The adaptive immune receptor repertoire (AIRR) contains information on an individuals' immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D), and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorithm which can detect alleles that differ by any number of SNPs from the nearest database allele, and can construct subject-specific genotypes with minimal prior information. TIgGER predictions are validated both computationally (using a leave-one-out strategy) and experimentally (using genomic sequencing), resulting in the addition of three new immunoglobulin heavy chain V (IGHV) gene alleles to the IMGT repertoire. Finally, we develop a Bayesian strategy to provide a confidence estimate associated with genotype calls. All together, these methods allow for much higher accuracy in germline allele assignment, an essential step in AIRR-seq studies.
Collapse
Affiliation(s)
- Daniel Gadala-Maria
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Moriah Gidoni
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Susanna Marquez
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Jason A. Vander Heiden
- Department of Neurology, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Justin T. Kos
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, United States
| | - Kevin C. O'Connor
- Department of Neurology, Yale School of Medicine, Yale University, New Haven, CT, United States
- Department of Immunobiology, Yale School of Medicine, Yale University, New Haven, CT, United States
| | - Gur Yaari
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Steven H. Kleinstein
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, United States
- Department of Immunobiology, Yale School of Medicine, Yale University, New Haven, CT, United States
| |
Collapse
|
33
|
Gidoni M, Snir O, Peres A, Polak P, Lindeman I, Mikocziova I, Sarna VK, Lundin KEA, Clouser C, Vigneault F, Collins AM, Sollid LM, Yaari G. Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping. Nat Commun 2019; 10:628. [PMID: 30733445 PMCID: PMC6367474 DOI: 10.1038/s41467-019-08489-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 01/10/2019] [Indexed: 12/11/2022] Open
Abstract
Analysis of antibody repertoires by high-throughput sequencing is of major importance in understanding adaptive immune responses. Our knowledge of variations in the genomic loci encoding immunoglobulin genes is incomplete, resulting in conflicting VDJ gene assignments and biased genotype and haplotype inference. Haplotypes can be inferred using IGHJ6 heterozygosity, observed in one third of the people. Here, we propose a robust novel method for determining VDJ haplotypes by adapting a Bayesian framework. Our method extends haplotype inference to IGHD- and IGHV-based analysis, enabling inference of deletions and copy number variations in the entire population. To test this method, we generated a multi-individual data set of naive B-cell repertoires, and found allele usage bias, as well as a mosaic, tiled pattern of deleted IGHD and IGHV genes. The inferred haplotypes may have clinical implications for genetic disease predispositions. Our findings expand the knowledge that can be extracted from antibody repertoire sequencing data.
Collapse
Affiliation(s)
- Moriah Gidoni
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel
| | - Omri Snir
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel
| | - Ida Lindeman
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Ivana Mikocziova
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Vikas Kumar Sarna
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Knut E A Lundin
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | | | | | - Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of NSW, Kensington, Sydney, NSW, 2052, Australia
| | - Ludvig M Sollid
- KG Jebsen Centre for Coeliac Disease Research and Department of Immunology, University of Oslo and Oslo University Hospital, 0372, Oslo, Norway
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, 5290002, Ramat Gan, Israel.
| |
Collapse
|
34
|
Kaur H, Sain N, Mohanty D, Salunke DM. Deciphering evolution of immune recognition in antibodies. BMC STRUCTURAL BIOLOGY 2018; 18:19. [PMID: 30563492 PMCID: PMC6299584 DOI: 10.1186/s12900-018-0096-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 11/14/2018] [Indexed: 11/29/2022]
Abstract
Background Antibody, the primary effector molecule of the immune system, evolves after initial encounter with the antigen from a precursor form to a mature one to effectively deal with the antigen. Antibodies of a lineage diverge through antigen-directed isolated pathways of maturation to exhibit distinct recognition potential. In the context of evolution in immune recognition, diversity of antigen cannot be ignored. While there are reports on antibody lineage, structural perspective with respect to diverse recognition potential in a lineage has never been studied. Hence, it is crucial to evaluate how maturation leads to topological tailoring within a lineage enabling them to interact with significantly distinct antigens. Results A data-driven approach was undertaken for the study. Global experimental mouse and human antibody-antigen complex structures from PDB were compiled into a coherent database of germline-linked antibodies bound with distinct antigens. Structural analysis of all lineages showed variations in CDRs of both H and L chains. Observations of conformational adaptation made from analysis of static structures were further evaluated by characterizing dynamics of interaction in two lineages, mouse VH1–84 and human VH5–51. Sequence and structure analysis of the lineages explained that somatic mutations altered the geometries of individual antibodies with common structural constraints in some CDRs. Additionally, conformational landscape obtained from molecular dynamics simulations revealed that incoming pathogen led to further conformational divergence in the paratope (as observed across datasets) even while maintaining similar overall backbone topology. MM-GB/SA analysis showed binding energies to be in physiological range. Results of the study are coherent with experimental observations. Conclusions The findings of this study highlight basic structural principles shaping the molecular evolution of a lineage for significantly diverse antigens. Antibodies of a lineage follow different developmental pathways while preserving the imprint of the germline. From the study, it can be generalized that structural diversification of the paratope is an outcome of natural selection of a conformation from an available ensemble, which is further optimized for antigen interaction. The study establishes that starting from a common lineage, antibodies can mature to recognize a wide range of antigens. This hypothesis can be further tested and validated experimentally. Electronic supplementary material The online version of this article (10.1186/s12900-018-0096-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Harmeet Kaur
- Regional Centre for Biotechnology, Biotech Science Cluster, Faridabad, Haryana, 121001, India.,Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Neetu Sain
- National Institute of Immunology, New Delhi, Delhi, 110067, India
| | - Debasisa Mohanty
- National Institute of Immunology, New Delhi, Delhi, 110067, India
| | - Dinakar M Salunke
- Regional Centre for Biotechnology, Biotech Science Cluster, Faridabad, Haryana, 121001, India. .,International Centre for Genetic Engineering and Biotechnology, New Delhi, Delhi, 110067, India.
| |
Collapse
|
35
|
Breden F, Watson CT. Using High-Throughput Sequencing to Characterize the Development of the Antibody Repertoire During Infections: A Case Study of HIV-1. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1053:245-263. [PMID: 29549643 DOI: 10.1007/978-3-319-72077-7_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
High throughput sequencing (HTS) approaches have only recently been applied to describing the antibody/B-cell repertoire in fine detail, but these data sets have already become critical to the design of vaccines and therapeutics, and monitoring of cancer immunotherapy. As a case study, we describe the potential and present limitations of HTS studies of the Ab repertoire during infection with HIV-1. Most of the present studies restrict their analyses to lineages of specific bnAbs. We discuss future initiatives to expand this type of analysis to more complete repertoires and to improve comparing and sharing of these Ab repertoire data across studies and institutions.
Collapse
Affiliation(s)
- Felix Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, BC, Canada.
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| |
Collapse
|
36
|
Abstract
Probabilistic modeling is fundamental to the statistical analysis of complex data. In addition to forming a coherent description of the data-generating process, probabilistic models enable parameter inference about given datasets. This procedure is well developed in the Bayesian perspective, in which one infers probability distributions describing to what extent various possible parameters agree with the data. In this paper, we motivate and review probabilistic modeling for adaptive immune receptor repertoire data then describe progress and prospects for future work, from germline haplotyping to adaptive immune system deployment across tissues. The relevant quantities in immune sequence analysis include not only continuous parameters such as gene use frequency but also discrete objects such as B-cell clusters and lineages. Throughout this review, we unravel the many opportunities for probabilistic modeling in adaptive immune receptor analysis, including settings for which the Bayesian approach holds substantial promise (especially if one is optimistic about new computational methods). From our perspective, the greatest prospects for progress in probabilistic modeling for repertoires concern ancestral sequence estimation for B-cell receptor lineages, including uncertainty from germline genotype, rearrangement, and lineage development.
Collapse
Affiliation(s)
- Branden Olson
- Computational Biology Program Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: M1-B514 Seattle, WA 98109-1024 phone: +1 206 667 7318
| | - Frederick A. Matsen
- Computational Biology Program Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: M1-B514 Seattle, WA 98109-1024 phone: +1 206 667 7318
| |
Collapse
|
37
|
Persson H, Kirik U, Thörnqvist L, Greiff L, Levander F, Ohlin M. In Vitro Evolution of Antibodies Inspired by In Vivo Evolution. Front Immunol 2018; 9:1391. [PMID: 29977238 PMCID: PMC6021498 DOI: 10.3389/fimmu.2018.01391] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 06/05/2018] [Indexed: 01/16/2023] Open
Abstract
In vitro generation of antibodies often requires variable domain sequence evolution to adapt the protein in terms of affinity, specificity, or developability. Such antibodies, including those that are of interest for clinical development, may have their origins in a diversity of immunoglobulin germline genes. Others and we have previously shown that antibodies of different origins tend to evolve along different, preferred trajectories. Apart from substitutions within the complementary determining regions, evolution may also, in a germline gene-origin-defined manner, be focused to residues in the framework regions, and even to residues within the protein core, in many instances at a substantial distance from the antibody’s antigen-binding site. Examples of such germline origin-defined patterns of evolution are described. We propose that germline gene-preferred substitution patterns offer attractive alternatives that should be considered in efforts to evolve antibodies intended for therapeutic use with respect to appropriate affinity, specificity, and product developability. We also hypothesize that such germline gene-origin-defined in vitro evolution hold potential to result in products with limited immunogenicity, as similarly evolved antibodies will be parts of conventional, in vivo-generated antibody responses and thus are likely to have been seen by the immune system in the past.
Collapse
Affiliation(s)
- Helena Persson
- Drug Discovery and Development Platform, Science for Life Laboratory, Stockholm, Sweden.,School of Engineering Sciences in Chemistry, Biotechnology and Health, Royal Institute of Technology, Stockholm, Sweden
| | - Ufuk Kirik
- Department of Immunotechnology, Lund University, Lund, Sweden
| | | | - Lennart Greiff
- Department of Clinical Sciences, Lund University, Lund, Sweden.,Department of Otorhinolaryngology, Head & Neck Surgery, Skåne University Hospital, Lund, Sweden
| | | | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden.,Human Antibody Therapeutics, Drug Discovery and Development Platform, Science for Life Laboratory, Lund University, Lund, Sweden
| |
Collapse
|
38
|
Miho E, Yermanos A, Weber CR, Berger CT, Reddy ST, Greiff V. Computational Strategies for Dissecting the High-Dimensional Complexity of Adaptive Immune Repertoires. Front Immunol 2018; 9:224. [PMID: 29515569 PMCID: PMC5826328 DOI: 10.3389/fimmu.2018.00224] [Citation(s) in RCA: 127] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 01/26/2018] [Indexed: 12/21/2022] Open
Abstract
The adaptive immune system recognizes antigens via an immense array of antigen-binding antibodies and T-cell receptors, the immune repertoire. The interrogation of immune repertoires is of high relevance for understanding the adaptive immune response in disease and infection (e.g., autoimmunity, cancer, HIV). Adaptive immune receptor repertoire sequencing (AIRR-seq) has driven the quantitative and molecular-level profiling of immune repertoires, thereby revealing the high-dimensional complexity of the immune receptor sequence landscape. Several methods for the computational and statistical analysis of large-scale AIRR-seq data have been developed to resolve immune repertoire complexity and to understand the dynamics of adaptive immunity. Here, we review the current research on (i) diversity, (ii) clustering and network, (iii) phylogenetic, and (iv) machine learning methods applied to dissect, quantify, and compare the architecture, evolution, and specificity of immune repertoires. We summarize outstanding questions in computational immunology and propose future directions for systems immunology toward coupling AIRR-seq with the computational discovery of immunotherapeutics, vaccines, and immunodiagnostics.
Collapse
Affiliation(s)
- Enkelejda Miho
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- aiNET GmbH, ETH Zürich, Basel, Switzerland
| | - Alexander Yermanos
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Cédric R. Weber
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Christoph T. Berger
- Department of Biomedicine, University Hospital Basel, Basel, Switzerland
- Department of Internal Medicine, Clinical Immunology, University Hospital Basel, Basel, Switzerland
| | - Sai T. Reddy
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Victor Greiff
- Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Department of Immunology, University of Oslo, Oslo, Norway
| |
Collapse
|
39
|
On being the right size: antibody repertoire formation in the mouse and human. Immunogenetics 2017; 70:143-158. [DOI: 10.1007/s00251-017-1049-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/04/2017] [Indexed: 01/01/2023]
|
40
|
Steele EJ. Reverse Transcriptase Mechanism of Somatic Hypermutation: 60 Years of Clonal Selection Theory. Front Immunol 2017; 8:1611. [PMID: 29218047 PMCID: PMC5704389 DOI: 10.3389/fimmu.2017.01611] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 11/07/2017] [Indexed: 01/24/2023] Open
Abstract
The evidence for the reverse transcriptase mechanism of somatic hypermutation is substantial and multifactorial. In this 60th anniversary year of the publication of Sir MacFarlane Burnet's Clonal Selection Theory, the evidence is briefly reviewed and updated.
Collapse
Affiliation(s)
- Edward J. Steele
- CYO’Connor ERADE Village Foundation Inc., Piara Waters, WA, Australia
| |
Collapse
|
41
|
Fisher CR, Sutton HJ, Kaczmarski JA, McNamara HA, Clifton B, Mitchell J, Cai Y, Dups JN, D'Arcy NJ, Singh M, Chuah A, Peat TS, Jackson CJ, Cockburn IA. T-dependent B cell responses to Plasmodium induce antibodies that form a high-avidity multivalent complex with the circumsporozoite protein. PLoS Pathog 2017; 13:e1006469. [PMID: 28759640 PMCID: PMC5552345 DOI: 10.1371/journal.ppat.1006469] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 08/10/2017] [Accepted: 06/13/2017] [Indexed: 11/18/2022] Open
Abstract
The repeat region of the Plasmodium falciparum circumsporozoite protein (CSP) is a major vaccine antigen because it can be targeted by parasite neutralizing antibodies; however, little is known about this interaction. We used isothermal titration calorimetry, X-ray crystallography and mutagenesis-validated modeling to analyze the binding of a murine neutralizing antibody to Plasmodium falciparum CSP. Strikingly, we found that the repeat region of CSP is bound by multiple antibodies. This repeating pattern allows multiple weak interactions of single FAB domains to accumulate and yield a complex with a dissociation constant in the low nM range. Because the CSP protein can potentially cross-link multiple B cell receptors (BCRs) we hypothesized that the B cell response might be T cell independent. However, while there was a modest response in mice deficient in T cell help, the bulk of the response was T cell dependent. By sequencing the BCRs of CSP-repeat specific B cells in inbred mice we found that these cells underwent somatic hypermutation and affinity maturation indicative of a T-dependent response. Last, we found that the BCR repertoire of responding B cells was limited suggesting that the structural simplicity of the repeat may limit the breadth of the immune response. Vaccines aim to protect by inducing the immune system to make molecules called antibodies that can recognize molecules on the surface of invading pathogens. In the case of malaria, our most advanced vaccine candidates aim to promote the production of antibodies that recognize the circumsporozoite protein (CSP) molecule on the surface of the invasive parasite stage called the sporozoite. In this report we use X-ray crystallography to determine the structure of CSP-binding antibodies at the atomic level. We use other techniques such as isothermal titration calorimetry and structural modeling to examine how this antibody interacts with the CSP molecule. Strikingly, we found that each CSP molecule could bind 6 antibodies. This finding has implications for the immune response and may explain why high titers of antibody are needed for protection. Moreover, because the structure of the CSP repeat is quite simple we determined that the number of different kinds of antibodies that could bind this molecule are quite small. However a high avidity interaction between those antibodies and CSP can result from a process called affinity maturation that allows the body to learn how to make improved antibodies specific for pathogen molecules. These data show that while it is challenging for the immune system to recognize and neutralize CSP, it should be possible to generate viable vaccines targeting this molecule.
Collapse
Affiliation(s)
- Camilla R. Fisher
- Research School of Chemistry, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Henry J. Sutton
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Joe A. Kaczmarski
- Research School of Chemistry, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Hayley A. McNamara
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Ben Clifton
- Research School of Chemistry, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Joshua Mitchell
- Research School of Chemistry, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Yeping Cai
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Johanna N. Dups
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Nicholas J. D'Arcy
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Mandeep Singh
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Aaron Chuah
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Thomas S. Peat
- CSIRO Biomedical Manufacturing Program, Parkville, Victoria, Australia
| | - Colin J. Jackson
- Research School of Chemistry, The Australian National University, Canberra, Australian Capital Territory, Australia
- * E-mail: (CJJ); (IAC)
| | - Ian A. Cockburn
- John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
- * E-mail: (CJJ); (IAC)
| |
Collapse
|
42
|
Kirik U, Greiff L, Levander F, Ohlin M. Parallel antibody germline gene and haplotype analyses support the validity of immunoglobulin germline gene inference and discovery. Mol Immunol 2017; 87:12-22. [DOI: 10.1016/j.molimm.2017.03.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 03/07/2017] [Accepted: 03/08/2017] [Indexed: 12/18/2022]
|
43
|
Watson CT, Glanville J, Marasco WA. The Individual and Population Genetics of Antibody Immunity. Trends Immunol 2017; 38:459-470. [PMID: 28539189 PMCID: PMC5656258 DOI: 10.1016/j.it.2017.04.003] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 04/06/2017] [Accepted: 04/10/2017] [Indexed: 12/12/2022]
Abstract
Antibodies (Abs) produced by immunoglobulin (IG) genes are the most diverse proteins expressed in humans. While part of this diversity is generated by recombination during B-cell development and mutations during affinity maturation, the germ-line IG loci are also diverse across human populations and ethnicities. Recently, proof-of-concept studies have demonstrated genotype–phenotype correlations between specific IG germ-line variants and the quality of Ab responses during vaccination and disease. However, the functional consequences of IG genetic variation in Ab function and immunological outcomes remain underexplored. In this opinion article, we outline interconnections between IG genomic diversity and Ab-expressed repertoires and structure. We further propose a strategy for integrating IG genotyping with functional Ab profiling data as a means to better predict and optimize humoral responses in genetically diverse human populations, with immediate implications for personalized medicine. Genetic variation in human populations affects how individuals are able to mount functional antibody responses. Different alleles can encode convergent binding motifs that result in successful Ab responses against specific infections and vaccinations. Given the complexity of the IG loci and the diversity of the antibody repertoire, links between IG polymorphism and antibody repertoire variability have not been thoroughly explored. We present a strategy to mine genotype–repertoire–disease associations.
Collapse
Affiliation(s)
- Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA.
| | - Jacob Glanville
- Institute for Immunity, Transplantation and Infection, and Computational and Systems Immunology, Stanford University School of Medicine, Stanford, CA, USA.
| | - Wayne A Marasco
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
44
|
Watson CT, Matsen FA, Jackson KJL, Bashir A, Smith ML, Glanville J, Breden F, Kleinstein SH, Collins AM, Busse CE. Comment on “A Database of Human Immune Receptor Alleles Recovered from Population Sequencing Data”. THE JOURNAL OF IMMUNOLOGY 2017; 198:3371-3373. [DOI: 10.4049/jimmunol.1700306] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
45
|
Yu Y, Ceredig R, Seoighe C. A Database of Human Immune Receptor Alleles Recovered from Population Sequencing Data. THE JOURNAL OF IMMUNOLOGY 2017; 198:2202-2210. [DOI: 10.4049/jimmunol.1601710] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 01/03/2017] [Indexed: 01/05/2023]
|
46
|
Luo S, Yu JA, Song YS. Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads. PLoS Comput Biol 2016; 12:e1005117. [PMID: 27632220 PMCID: PMC5025152 DOI: 10.1371/journal.pcbi.1005117] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 08/23/2016] [Indexed: 11/28/2022] Open
Abstract
The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci.
Collapse
Affiliation(s)
- Shishi Luo
- Computer Science Division, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
| | - Jane A. Yu
- Computer Science Division, University of California, Berkeley, Berkeley, California, United States of America
| | - Yun S. Song
- Computer Science Division, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
- Departments of Mathematics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
47
|
Reed JH, Jackson J, Christ D, Goodnow CC. Clonal redemption of autoantibodies by somatic hypermutation away from self-reactivity during human immunization. J Exp Med 2016; 213:1255-65. [PMID: 27298445 PMCID: PMC4925023 DOI: 10.1084/jem.20151978] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Accepted: 05/02/2016] [Indexed: 11/23/2022] Open
Abstract
Clonal anergy is an enigmatic self-tolerance mechanism because no apparent purpose is served by retaining functionally silenced B cells bearing autoantibodies. Human autoantibodies with IGHV4-34*01 heavy chains bind to poly-N-acetyllactosamine carbohydrates (I/i antigen) on erythrocytes and B lymphocytes, cause cold agglutinin disease, and are carried by 5% of naive B cells that are anergic. We analyzed the specificity of three IGHV4-34*01 IgG antibodies isolated from healthy donors immunized against foreign rhesus D alloantigen or vaccinia virus. Each IgG was expressed and analyzed either in a hypermutated immune state or after reverting each antibody to its unmutated preimmune ancestor. In each case, the preimmune ancestor IgG bound intensely to normal human B cells bearing I/i antigen. Self-reactivity was removed by a single somatic mutation that paradoxically decreased binding to the foreign immunogen, whereas other mutations conferred increased foreign binding. These data demonstrate the existence of a mechanism for mutation away from self-reactivity in humans. Because 2.5% of switched memory B cells use IGHV4-34*01 and >43% of these have mutations that remove I/i binding, clonal redemption of anergic cells appears efficient during physiological human antibody responses.
Collapse
Affiliation(s)
- Joanne H Reed
- Department of Immunology, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia St. Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, NSW 2010, Australia
| | - Jennifer Jackson
- Department of Immunology, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia
| | - Daniel Christ
- Department of Immunology, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia St. Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, NSW 2010, Australia
| | - Christopher C Goodnow
- Department of Immunology, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia St. Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, NSW 2010, Australia
| |
Collapse
|
48
|
Collins AM, Wang Y, Roskin KM, Marquis CP, Jackson KJL. The mouse antibody heavy chain repertoire is germline-focused and highly variable between inbred strains. Philos Trans R Soc Lond B Biol Sci 2016; 370:rstb.2014.0236. [PMID: 26194750 DOI: 10.1098/rstb.2014.0236] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The human and mouse antibody repertoires are formed by identical processes, but like all small animals, mice only have sufficient lymphocytes to express a small part of the potential antibody repertoire. In this study, we determined how the heavy chain repertoires of two mouse strains are generated. Analysis of IgM- and IgG-associated VDJ rearrangements generated by high-throughput sequencing confirmed the presence of 99 functional immunoglobulin heavy chain variable (IGHV) genes in the C57BL/6 genome, and inferred the presence of 164 IGHV genes in the BALB/c genome. Remarkably, only five IGHV sequences were common to both strains. Compared with humans, little N nucleotide addition was seen in the junctions of mouse VDJ genes. Germline human IgG-associated IGHV genes are rare, but many murine IgG-associated IGHV genes were unmutated. Together these results suggest that the expressed mouse repertoire is more germline-focused than the human repertoire. The apparently divergent germline repertoires of the mouse strains are discussed with reference to reports that inbred mouse strains carry blocks of genes derived from each of the three subspecies of the house mouse. We hypothesize that the germline genes of BALB/c and C57BL/6 mice may originally have evolved to generate distinct germline-focused antibody repertoires in the different mouse subspecies.
Collapse
Affiliation(s)
- Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia
| | - Yan Wang
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia
| | - Krishna M Roskin
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305-5324, USA
| | - Christopher P Marquis
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia
| | - Katherine J L Jackson
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305-5324, USA
| |
Collapse
|
49
|
Boyd SD, Crowe JE. Deep sequencing and human antibody repertoire analysis. Curr Opin Immunol 2016; 40:103-9. [PMID: 27065089 DOI: 10.1016/j.coi.2016.03.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 03/22/2016] [Accepted: 03/23/2016] [Indexed: 01/16/2023]
Abstract
In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field.
Collapse
Affiliation(s)
- Scott D Boyd
- Department of Pathology, Stanford University, Stanford, CA 94305, United States.
| | - James E Crowe
- Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN 37232-0417, United States.
| |
Collapse
|
50
|
Cortina-Ceballos B, Godoy-Lozano EE, Sámano-Sánchez H, Aguilar-Salgado A, Velasco-Herrera MDC, Vargas-Chávez C, Velázquez-Ramírez D, Romero G, Moreno J, Téllez-Sosa J, Martínez-Barnetche J. Reconstructing and mining the B cell repertoire with ImmunediveRsity. MAbs 2016; 7:516-24. [PMID: 25875140 PMCID: PMC4622655 DOI: 10.1080/19420862.2015.1026502] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The B cell antigen receptor repertoire is highly diverse and constantly modified by clonal selection. High-throughput DNA sequencing (HTS) of the lymphocyte repertoire (Rep-Seq) represents a promising technology to explore such diversity ex-vivo and assist in the identification of antigen-specific antibodies based on molecular signatures of clonal selection. Therefore, integrative tools for repertoire reconstruction and analysis from antibody sequences are needed. We developed ImmunediveRity, a stand-alone pipeline primarily based in R programming for the integral analysis of B cell repertoire data generated by HTS. The pipeline integrates GNU software and in house scripts to perform quality filtering, sequencing noise correction and repertoire reconstruction based on V, D and J segment assignment, clonal origin and unique heavy chain identification. Post-analysis scripts generate a wealth of repertoire metrics that in conjunction with a rich graphical output facilitates sample comparison and repertoire mining. Its performance was tested with raw and curated human and mouse 454-Roche sequencing benchmarks providing good approximations of repertoire structure. Furthermore, ImmunediveRsity was used to mine the B cell repertoire of immunized mice with a model antigen, allowing the identification of previously validated antigen-specific antibodies, and revealing different and unexpected clonal diversity patterns in the post-immunization IgM and IgG compartments. Although ImmunediveRsity is similar to other recently developed tools, it offers significant advantages that facilitate repertoire analysis and repertoire mining. ImmunediveRsity is open source and free for academic purposes and it runs on 64 bit GNU/Linux and MacOS. Available at: https://bitbucket.org/ImmunediveRsity/immunediversity/
Collapse
Affiliation(s)
- Bernardo Cortina-Ceballos
- a Centro de Investigación Sobre Enfermedades Infecciosas; Instituto Nacional de Salud Pública (CISEI-INSP); Cuernavaca , Morelos , México
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|