1
|
Wang K, Cai L, Wang H, Shan S, Hu X, Zhang J. Protocol for fast clonal family inference and analysis from large-scale B cell receptor repertoire sequencing data. STAR Protoc 2024; 5:102969. [PMID: 38502687 PMCID: PMC10963638 DOI: 10.1016/j.xpro.2024.102969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/26/2024] [Accepted: 03/03/2024] [Indexed: 03/21/2024] Open
Abstract
The expeditious identification and comprehensive analysis of clonal families from extensive B cell receptor (BCR) repertoire sequencing data are imperative for elucidating the intricacies of B cell immune responses. Here, we introduce a computational pipeline designed to swiftly deduce clonal families from bulk BCR heavy-chain sequencing data, accompanied by a suite of functional modules tailored to streamline post-clustering analysis. The outlined methodology encompasses guidelines for software installation, meticulous data preparation, and the systematic inference and analysis of clonal families. For complete details on the use and execution of this protocol, please refer to Wang et al.1.
Collapse
Affiliation(s)
- Kaixuan Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Linru Cai
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Hao Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China; Georgia Tech Shenzhen Institute (GTSI), Tianjin University, Shenzhen, Guangdong, China
| | - Shiwen Shan
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Xihao Hu
- GV20 Therapeutics, Cambridge, MA, USA
| | - Jian Zhang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China.
| |
Collapse
|
2
|
Balashova D, van Schaik BDC, Stratigopoulou M, Guikema JEJ, Caniels TG, Claireaux M, van Gils MJ, Musters A, Anang DC, de Vries N, Greiff V, van Kampen AHC. Systematic evaluation of B-cell clonal family inference approaches. BMC Immunol 2024; 25:13. [PMID: 38331731 PMCID: PMC11370117 DOI: 10.1186/s12865-024-00600-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 01/18/2024] [Indexed: 02/10/2024] Open
Abstract
The reconstruction of clonal families (CFs) in B-cell receptor (BCR) repertoire analysis is a crucial step to understand the adaptive immune system and how it responds to antigens. The BCR repertoire of an individual is formed throughout life and is diverse due to several factors such as gene recombination and somatic hypermutation. The use of Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using next generation sequencing enabled the generation of full BCR repertoires that also include rare CFs. The reconstruction of CFs from AIRR-seq data is challenging and several approaches have been developed to solve this problem. Currently, most methods use the heavy chain (HC) only, as it is more variable than the light chain (LC). CF reconstruction options include the definition of appropriate sequence similarity measures, the use of shared mutations among sequences, and the possibility of reconstruction without preliminary clustering based on V- and J-gene annotation. In this study, we aimed to systematically evaluate different approaches for CF reconstruction and to determine their impact on various outcome measures such as the number of CFs derived, the size of the CFs, and the accuracy of the reconstruction. The methods were compared to each other and to a method that groups sequences based on identical junction sequences and another method that only determines subclones. We found that after accounting for data set variability, in particular sequencing depth and mutation load, the reconstruction approach has an impact on part of the outcome measures, including the number of CFs. Simulations indicate that unique junctions and subclones should not be used as substitutes for CF and that more complex methods do not outperform simpler methods. Also, we conclude that different approaches differ in their ability to correctly reconstruct CFs when not considering the LC and to identify shared CFs. The results showed the effect of different approaches on the reconstruction of CFs and highlighted the importance of choosing an appropriate method.
Collapse
Affiliation(s)
- Daria Balashova
- Amsterdam UMC location University of Amsterdam, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Public Health, Methodology, Amsterdam, The Netherlands
- Amsterdam Infection and Immunity, Inflammatory Diseases, Amsterdam, The Netherlands
| | - Barbera D C van Schaik
- Amsterdam UMC location University of Amsterdam, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Public Health, Methodology, Amsterdam, The Netherlands
- Amsterdam Infection and Immunity, Inflammatory Diseases, Amsterdam, The Netherlands
| | - Maria Stratigopoulou
- Cancer Center Amsterdam, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
| | - Jeroen E J Guikema
- Cancer Center Amsterdam, Amsterdam, The Netherlands
- Amsterdam UMC location University of Amsterdam, Pathology, Lymphoma and Myeloma Center Amsterdam, Meibergdreef 9, Amsterdam, Netherlands
| | - Tom G Caniels
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Infection and Immunity, Infectious Diseases, Amsterdam, The Netherlands
| | - Mathieu Claireaux
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Infection and Immunity, Infectious Diseases, Amsterdam, The Netherlands
| | - Marit J van Gils
- Amsterdam UMC location University of Amsterdam, Medical Microbiology and Infection Prevention, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Infection and Immunity, Infectious Diseases, Amsterdam, The Netherlands
| | - Anne Musters
- Amsterdam UMC location University of Amsterdam, Experimental Immunology, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
| | - Dornatien C Anang
- Amsterdam UMC location University of Amsterdam, Experimental Immunology, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
| | - Niek de Vries
- Amsterdam UMC location University of Amsterdam, Experimental Immunology, Meibergdreef 9, Amsterdam, Netherlands
- Amsterdam Rheumatology & Immunology Center, Amsterdam, The Netherlands
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Antoine H C van Kampen
- Amsterdam UMC location University of Amsterdam, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands.
- Amsterdam Public Health, Methodology, Amsterdam, The Netherlands.
- Amsterdam Infection and Immunity, Inflammatory Diseases, Amsterdam, The Netherlands.
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.
| |
Collapse
|
3
|
Wang K, Hu X, Zhang J. Fast clonal family inference from large-scale B cell repertoire sequencing data. CELL REPORTS METHODS 2023; 3:100601. [PMID: 37788671 PMCID: PMC10626204 DOI: 10.1016/j.crmeth.2023.100601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/31/2023] [Accepted: 09/08/2023] [Indexed: 10/05/2023]
Abstract
Advances in high-throughput sequencing technologies have facilitated the large-scale characterization of B cell receptor (BCR) repertoires. However, the vast amount and high diversity of the BCR sequences pose challenges for efficient and biologically meaningful analysis. Here, we introduce fastBCR, an efficient computational approach for inferring B cell clonal families from massive BCR heavy chain sequences. We demonstrate that fastBCR substantially reduces the running time while ensuring high accuracy on simulated datasets with diverse numbers of B cell lineages and varying mutation rates. We apply fastBCR to real BCR sequencing data from peripheral blood samples of COVID-19 patients, showing that the inferred clonal families display disease-associated features, as well as corresponding antigen-binding specificity and affinity. Overall, our results demonstrate the advantages of fastBCR for analyzing BCR repertoire data, which will facilitate the identification of disease-associated antibodies and improve our understanding of the B cell immune response.
Collapse
Affiliation(s)
- Kaixuan Wang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China
| | - Xihao Hu
- GV20 Therapeutics, Cambridge, MA, USA
| | - Jian Zhang
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin, China.
| |
Collapse
|
4
|
Jeusset L, Abdollahi N, Verny T, Armand M, De Septenville A, Davi F, Bernardes JS. ViCloD, an interactive web tool for visualizing B cell repertoires and analyzing intraclonal diversities: application to human B-cell tumors. NAR Genom Bioinform 2023; 5:lqad064. [PMID: 37388820 PMCID: PMC10304752 DOI: 10.1093/nargab/lqad064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 05/25/2023] [Accepted: 06/26/2023] [Indexed: 07/01/2023] Open
Abstract
High throughput sequencing of adaptive immune receptor repertoire (AIRR-seq) has provided numerous human immunoglobulin (IG) sequences allowing specific B cell receptor (BCR) studies such as the antigen-driven evolution of antibodies (soluble forms of the membrane-bound IG part of the BCR). AIRR-seq data allows researchers to examine intraclonal differences caused primarily by somatic hypermutations in IG genes and affinity maturation. Exploring this essential adaptive immunity process could help elucidate the generation of antibodies with high affinity or broadly neutralizing activities. Retracing their evolutionary history could also clarify how vaccines or pathogen exposition drive the humoral immune response, and unravel the clonal architecture of B cell tumors. Computational methods are necessary for large-scale analysis of AIRR-seq properties. However, there is no efficient and interactive tool for analyzing intraclonal diversity, permitting users to explore adaptive immune receptor repertoires in biological and clinical applications. Here we present ViCloD, a web server for large-scale visual analysis of repertoire clonality and intraclonal diversity. ViCloD uses preprocessed data in the format defined by the Adaptive Immune Receptor Repertoire (AIRR) Community. Then, it performs clonal grouping and evolutionary analyses, producing a collection of useful plots for clonal lineage inspection. The web server presents diverse functionalities, including repertoire navigation, clonal abundance analysis, and intraclonal evolutionary tree reconstruction. Users can download the analyzed data in different table formats and save the generated plots as images. ViCloD is a simple, versatile, and user-friendly tool that can help researchers and clinicians to analyze B cell intraclonal diversity. Moreover, its pipeline is optimized to process hundreds of thousands of sequences within a few minutes, allowing an efficient investigation of large and complex repertoires.
Collapse
Affiliation(s)
- Lucile Jeusset
- Sorbonne Université, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- Sorbonne Université, AP-HP, Hôpital Pitié-Salpêtrière, Department of Biological Hematology, Paris, France
| | - Nika Abdollahi
- Sorbonne Université, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- IMGT, the international ImMunoGeneTics Information System, CNRS, Institute of Human Genetics, Montpellier University, France
| | - Thibaud Verny
- Sorbonne Université, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
- Ecole des Mines ParisTech, Paris, France
| | - Marine Armand
- Sorbonne Université, AP-HP, Hôpital Pitié-Salpêtrière, Department of Biological Hematology, Paris, France
| | | | - Frédéric Davi
- Sorbonne Université, AP-HP, Hôpital Pitié-Salpêtrière, Department of Biological Hematology, Paris, France
| | - Juliana Silva Bernardes
- Sorbonne Université, CNRS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
| |
Collapse
|
5
|
Ralph DK, Matsen FA. Inference of B cell clonal families using heavy/light chain pairing information. PLoS Comput Biol 2022; 18:e1010723. [PMID: 36441808 PMCID: PMC9731466 DOI: 10.1371/journal.pcbi.1010723] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 12/08/2022] [Accepted: 11/09/2022] [Indexed: 11/29/2022] Open
Abstract
Next generation sequencing of B cell receptor (BCR) repertoires has become a ubiquitous tool for understanding the antibody-mediated immune response: it is now common to have large volumes of sequence data coding for both the heavy and light chain subunits of the BCR. However, until the recent development of high throughput methods of preserving heavy/light chain pairing information, these samples contained no explicit information on which heavy chain sequence pairs with which light chain sequence. One of the first steps in analyzing such BCR repertoire samples is grouping sequences into clonally related families, where each stems from a single rearrangement event. Many methods of accomplishing this have been developed, however, none so far has taken full advantage of the newly-available pairing information. This information can dramatically improve clustering performance, especially for the light chain. The light chain has traditionally been challenging for clonal family inference because of its low diversity and consequent abundance of non-clonal families with indistinguishable naive rearrangements. Here we present a method of incorporating this pairing information into the clustering process in order to arrive at a more accurate partition of the data into clonally related families. We also demonstrate two methods of fixing imperfect pairing information, which may allow for simplified sample preparation and increased sequencing depth. Finally, we describe several other improvements to the partis software package.
Collapse
Affiliation(s)
- Duncan K. Ralph
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| | - Frederick A. Matsen
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Department of Statistics, University of Washington, Seattle, Washington, United States of America
- Howard Hughes Medical Institute, Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|