1
|
Prabakaran P, Gupta A, Rao SP, Rajpal D, Wendt M, Qiu Y, Chowdhury PS. Unveiling inverted D genes and D-D fusions in human antibody repertoires unlocks novel antibody diversity. Commun Biol 2025; 8:133. [PMID: 39875530 PMCID: PMC11775173 DOI: 10.1038/s42003-024-07441-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 12/23/2024] [Indexed: 01/30/2025] Open
Abstract
Antibodies, essential components of adaptive immunity, derive their remarkable diversity primarily from V(D)J gene rearrangements, particularly within the heavy chain complementarity-determining region 3 (CDR-H3) where D genes play a major role. Traditionally, D genes were thought to recombine only in the forward direction, despite having identical recombination signal sequences (12 base pair spacers) at both ends. This observation led us to question whether these symmetrical sequences might enable bidirectional recombination. We identified 25 unique inverted D genes (InvDs) in both naive and memory B cells from antibody repertoires of 13 healthy donors. These InvDs utilize all three reading frames during translation, producing distinct amino acid profiles enriched in histidine, proline, and lysine in CDR-H3s of antibodies with potential functional diversity. Notably, our analysis revealed a broader range of D-D fusions, including D-D, D-InvD, InvD-D, and InvD-InvD configurations, opening new perspectives for antibody engineering and therapeutic development.
Collapse
Affiliation(s)
- Ponraj Prabakaran
- Large Molecules Research, Sanofi, Cambridge, MA, USA.
- PMJ Technology Solutions, Frederick, MD, USA.
| | - Abhinav Gupta
- Large Molecules Research, Sanofi, Cambridge, MA, USA
| | - Sambasiva P Rao
- Large Molecules Research, Sanofi, Cambridge, MA, USA
- Takeda Pharmaceuticals, Cambridge, MA, USA
| | - Deepak Rajpal
- Translational Science, Sanofi, Cambridge, MA, USA
- Takeda Pharmaceuticals, Cambridge, MA, USA
| | - Maria Wendt
- Large Molecules Research, Sanofi, Cambridge, MA, USA
| | - Yu Qiu
- Large Molecules Research, Sanofi, Cambridge, MA, USA.
| | - Partha S Chowdhury
- Large Molecules Research, Sanofi, Cambridge, MA, USA.
- Johnson & Johnson R&D Center, Spring House, PA, USA.
| |
Collapse
|
2
|
Pursell T, Reers A, Mikelov A, Kotagiri P, Ellison JA, Hutson CL, Boyd SD, Frank HK. Genetically and Functionally Distinct Immunoglobulin Heavy Chain Locus Duplication in Bats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.606892. [PMID: 39211187 PMCID: PMC11360916 DOI: 10.1101/2024.08.09.606892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
The genetic locus encoding immunoglobulin heavy chains (IgH) is critical for vertebrate humoral immune responses and diverse antibody repertoires. Immunoglobulin and T cell receptor loci of most bat species have not been annotated, despite the recurrent role of bats as viral reservoirs and sources of zoonotic pathogens. We investigated the genetic structure and function of IgH loci across the largest bat family, Vespertilionidae, focusing on big brown bats (Eptesicus fuscus ). We discovered that E. fuscus and ten other species within Vespertilionidae have two complete, functional, and distinct immunoglobulin heavy chain loci on separate chromosomes. This locus organization is previously unknown in mammals, but is reminiscent of more limited duplicated loci in teleost fish. Single cell transcriptomic data validate functional rearrangement and expression of immunoglobulin heavy chains of both loci in the expressed repertoire of Eptesicus fuscus , with maintenance of allelic exclusion, bias of usage toward the smaller and more compact IgH locus, and evidence of differential selection of antigen-experienced B cells and plasma cells varying by IgH locus use. This represents a unique mechanism for mammalian humoral immunity and may contribute to bat resistance to viral pathogenesis.
Collapse
|
3
|
Omer A, Peres A, Rodriguez OL, Watson CT, Lees W, Polak P, Collins AM, Yaari G. T cell receptor beta germline variability is revealed by inference from repertoire data. Genome Med 2022; 14:2. [PMID: 34991709 PMCID: PMC8740489 DOI: 10.1186/s13073-021-01008-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 12/08/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND T and B cell receptor (TCR, BCR) repertoires constitute the foundation of adaptive immunity. Adaptive immune receptor repertoire sequencing (AIRR-seq) is a common approach to study immune system dynamics. Understanding the genetic factors influencing the composition and dynamics of these repertoires is of major scientific and clinical importance. The chromosomal loci encoding for the variable regions of TCRs and BCRs are challenging to decipher due to repetitive elements and undocumented structural variants. METHODS To confront this challenge, AIRR-seq-based methods have recently been developed for B cells, enabling genotype and haplotype inference and discovery of undocumented alleles. However, this approach relies on complete coverage of the receptors' variable regions, whereas most T cell studies sequence a small fraction of that region. Here, we adapted a B cell pipeline for undocumented alleles, genotype, and haplotype inference for full and partial AIRR-seq TCR data sets. The pipeline also deals with gene assignment ambiguities, which is especially important in the analysis of data sets of partial sequences. RESULTS From the full and partial AIRR-seq TCR data sets, we identified 39 undocumented polymorphisms in T cell receptor Beta V (TRBV) and 31 undocumented 5 ' UTR sequences. A subset of these inferences was also observed using independent genomic approaches. We found that a single nucleotide polymorphism differentiating between the two documented T cell receptor Beta D2 (TRBD2) alleles is strongly associated with dramatic changes in the expressed repertoire. CONCLUSIONS We reveal a rich picture of germline variability and demonstrate how a single nucleotide polymorphism dramatically affects the composition of the whole repertoire. Our findings provide a basis for annotation of TCR repertoires for future basic and clinical studies.
Collapse
Affiliation(s)
- Aviv Omer
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Ayelet Peres
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - William Lees
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | - Pazit Polak
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Andrew M Collins
- School of Biotechnology and Biomedical Sciences, University of New South Wales, Sydney, Australia
| | - Gur Yaari
- Faculty of Engineering, Bar Ilan University, Ramat Gan, 5290002, Israel.
- Bar Ilan institute of Nanotechnology and Advanced Materials, Bar Ilan University, Ramat Gan, 5290002, Israel.
| |
Collapse
|
4
|
On being the right size: antibody repertoire formation in the mouse and human. Immunogenetics 2017; 70:143-158. [DOI: 10.1007/s00251-017-1049-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/04/2017] [Indexed: 01/01/2023]
|
5
|
Ralph DK, Matsen FA. Likelihood-Based Inference of B Cell Clonal Families. PLoS Comput Biol 2016; 12:e1005086. [PMID: 27749910 PMCID: PMC5066976 DOI: 10.1371/journal.pcbi.1005086] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 07/27/2016] [Indexed: 11/18/2022] Open
Abstract
The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called “rearrangement” forming progenitor B cells, then a Darwinian process of lineage diversification and selection called “affinity maturation.” The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem “clonal family inference.” In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM) framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets. Antibodies must recognize a great diversity of antigens to protect us from infectious disease. The binding properties of antibodies are determined by the DNA sequences of their corresponding B cell receptors (BCRs). These BCR sequences are created in naive form by VDJ recombination, which randomly selects and trims the ends of V, D, and J genes, then joins the resulting segments together with additional random nucleotides. If they pass initial screening and bind an antigen, these sequences then undergo an evolutionary process of reproduction, mutation, and selection, revising the BCR to improve binding to its cognate antigen. It has recently become possible to determine the BCR sequences resulting from this process in high throughput. Although these sequences implicitly contain a wealth of information about both antigen exposure and the process by which we learn to resist pathogens, this information can only be extracted using computer algorithms. In this paper we describe a likelihood-based statistical method to determine, given a collection of BCR sequences, which of them are derived from the same recombination events. It is based on a hidden Markov model (HMM) of VDJ rearrangement which is able to calculate likelihoods for many sequences at once.
Collapse
MESH Headings
- B-Lymphocytes/immunology
- Clone Cells/immunology
- Computer Simulation
- Gene Rearrangement, B-Lymphocyte/genetics
- Gene Rearrangement, B-Lymphocyte/immunology
- High-Throughput Nucleotide Sequencing/methods
- Models, Genetic
- Models, Immunological
- Models, Statistical
- Receptors, Antigen, B-Cell/genetics
- Receptors, Antigen, B-Cell/immunology
- Sequence Analysis, DNA
Collapse
Affiliation(s)
- Duncan K. Ralph
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Frederick A. Matsen
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
6
|
Collins AM, Wang Y, Roskin KM, Marquis CP, Jackson KJL. The mouse antibody heavy chain repertoire is germline-focused and highly variable between inbred strains. Philos Trans R Soc Lond B Biol Sci 2016; 370:rstb.2014.0236. [PMID: 26194750 DOI: 10.1098/rstb.2014.0236] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The human and mouse antibody repertoires are formed by identical processes, but like all small animals, mice only have sufficient lymphocytes to express a small part of the potential antibody repertoire. In this study, we determined how the heavy chain repertoires of two mouse strains are generated. Analysis of IgM- and IgG-associated VDJ rearrangements generated by high-throughput sequencing confirmed the presence of 99 functional immunoglobulin heavy chain variable (IGHV) genes in the C57BL/6 genome, and inferred the presence of 164 IGHV genes in the BALB/c genome. Remarkably, only five IGHV sequences were common to both strains. Compared with humans, little N nucleotide addition was seen in the junctions of mouse VDJ genes. Germline human IgG-associated IGHV genes are rare, but many murine IgG-associated IGHV genes were unmutated. Together these results suggest that the expressed mouse repertoire is more germline-focused than the human repertoire. The apparently divergent germline repertoires of the mouse strains are discussed with reference to reports that inbred mouse strains carry blocks of genes derived from each of the three subspecies of the house mouse. We hypothesize that the germline genes of BALB/c and C57BL/6 mice may originally have evolved to generate distinct germline-focused antibody repertoires in the different mouse subspecies.
Collapse
Affiliation(s)
- Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia
| | - Yan Wang
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia
| | - Krishna M Roskin
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305-5324, USA
| | - Christopher P Marquis
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia
| | - Katherine J L Jackson
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, 2052 NSW, Australia Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305-5324, USA
| |
Collapse
|
7
|
Ralph DK, Matsen FA. Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation. PLoS Comput Biol 2016; 12:e1004409. [PMID: 26751373 PMCID: PMC4709141 DOI: 10.1371/journal.pcbi.1004409] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 06/20/2015] [Indexed: 11/18/2022] Open
Abstract
VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM "factorization" strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.
Collapse
Affiliation(s)
- Duncan K. Ralph
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Frederick A. Matsen
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
8
|
Jackson KJL, Kidd MJ, Wang Y, Collins AM. The shape of the lymphocyte receptor repertoire: lessons from the B cell receptor. Front Immunol 2013; 4:263. [PMID: 24032032 PMCID: PMC3759170 DOI: 10.3389/fimmu.2013.00263] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 08/19/2013] [Indexed: 11/13/2022] Open
Abstract
Both the B cell receptor (BCR) and the T cell receptor (TCR) repertoires are generated through essentially identical processes of V(D)J recombination, exonuclease trimming of germline genes, and the random addition of non-template encoded nucleotides. The naïve TCR repertoire is constrained by thymic selection, and TCR repertoire studies have therefore focused strongly on the diversity of MHC-binding complementarity determining region (CDR) CDR3. The process of somatic point mutations has given B cell studies a major focus on variable (IGHV, IGLV, and IGKV) genes. This in turn has influenced how both the naïve and memory BCR repertoires have been studied. Diversity (D) genes are also more easily identified in BCR VDJ rearrangements than in TCR VDJ rearrangements, and this has allowed the processes and elements that contribute to the incredible diversity of the immunoglobulin heavy chain CDR3 to be analyzed in detail. This diversity can be contrasted with that of the light chain where a small number of polypeptide sequences dominate the repertoire. Biases in the use of different germline genes, in gene processing, and in the addition of non-template encoded nucleotides appear to be intrinsic to the recombination process, imparting "shape" to the repertoire of rearranged genes as a result of differences spanning many orders of magnitude in the probabilities that different BCRs will be generated. This may function to increase the precursor frequency of naïve B cells with important specificities, and the likely emergence of such B cell lineages upon antigen exposure is discussed with reference to public and private T cell clonotypes.
Collapse
Affiliation(s)
- Katherine J. L. Jackson
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Marie J. Kidd
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Yan Wang
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Andrew M. Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
9
|
Clustering-based identification of clonally-related immunoglobulin gene sequence sets. Immunome Res 2010; 6 Suppl 1:S4. [PMID: 20875155 PMCID: PMC2946782 DOI: 10.1186/1745-7580-6-s1-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Clonal expansion of B lymphocytes coupled with somatic mutation and antigen selection allow the mammalian humoral immune system to generate highly specific immunoglobulins (IG) or antibodies against invading bacteria, viruses and toxins. The availability of high-throughput DNA sequencing methods is providing new avenues for studying this clonal expansion and identifying the factors guiding the generation of antibodies. The identification of groups of rearranged immunoglobulin gene sequences descended from the same rearrangement (clonally-related sets) in very large sets of sequences is facilitated by the availability of immunoglobulin gene sequence alignment and partitioning software that can accurately predict component germline gene, but has required painstaking visual inspection and analysis of sequences. Results We have developed and implemented an algorithm for identifying sets of clonally-related sequences in large human immunoglobulin heavy chain gene variable region sequence sets. The program processes sequences that have been partitioned using iHMMune-align, and uses pairwise comparisons of CDR3 sequences and similarity in IGHV and IGHJ germline gene assignments to construct a distance matrix. Agglomerative hierarchical clustering is then used to identify likely groups of clonally-related sequences. The program is available for download from http://www.cse.unsw.edu.au/~ihmmune/ClonalRelate/ClonalRelate.zip. Conclusions The method was evaluated on several benchmark datasets and provided a more accurate and considerably faster identification of clonally-related immunoglobulin gene sequences than visual inspection by domain experts.
Collapse
|
10
|
Collins AM, Wang Y, Singh V, Yu P, Jackson KJ, Sewell WA. The reported germline repertoire of human immunoglobulin kappa chain genes is relatively complete and accurate. Immunogenetics 2008; 60:669-76. [PMID: 18712520 DOI: 10.1007/s00251-008-0325-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2008] [Accepted: 07/16/2008] [Indexed: 01/31/2023]
Abstract
We describe a bioinformatic analysis of germline and rearranged immunoglobulin kappa chain (IGK) gene sequences, performed in order to assess the completeness and reliability of the reported IGK repertoire. In contrast to the reported heavy-chain gene repertoire, which includes many dubious sequences, only five IGK variable gene (IGKV) alleles appear to have been reported in error. There was, however, insufficient evidence to justify removing these IGKV genes from the germline repertoire. Bioinformatic analysis of apparent mismatches between reported germline genes and 1,863 expressed IGK sequences suggested the existence of two unreported IGKV polymorphisms. Genomic screening of 12 individuals led to the confirmation of both of these polymorphisms, IGKV1-16*02 and IGKV2-30*02. We also show that in contrast to the heavy chain, the IGK repertoire is dominated by sequences that use just a handful of kappa variable (IGKV) and junction (IGKJ) gene pairs. There is also little modification of IGKV and IGKJ genes by the processes of exonuclease removal and N nucleotide addition. The expressed IGK repertoire therefore lacks diversity and the junction region is particularly constrained. Remarkably, the analysis of a dataset of 435 relatively unmutated rearranged kappa genes showed that ten amino acid sequences account for almost 10% of the rearrangements, with identical sequences being derived from as many as seven independent sources. Such dominant sequences are likely to have important roles in the operation of the humoral immune response.
Collapse
Affiliation(s)
- Andrew M Collins
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, Australia.
| | | | | | | | | | | |
Collapse
|