1
|
Olsen VK, Whitlock JR, Roudi Y. The quality and complexity of pairwise maximum entropy models for large cortical populations. PLoS Comput Biol 2024; 20:e1012074. [PMID: 38696532 DOI: 10.1371/journal.pcbi.1012074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 05/14/2024] [Accepted: 04/10/2024] [Indexed: 05/04/2024] Open
Abstract
We investigate the ability of the pairwise maximum entropy (PME) model to describe the spiking activity of large populations of neurons recorded from the visual, auditory, motor, and somatosensory cortices. To quantify this performance, we use (1) Kullback-Leibler (KL) divergences, (2) the extent to which the pairwise model predicts third-order correlations, and (3) its ability to predict the probability that multiple neurons are simultaneously active. We compare these with the performance of a model with independent neurons and study the relationship between the different performance measures, while varying the population size, mean firing rate of the chosen population, and the bin size used for binarizing the data. We confirm the previously reported excellent performance of the PME model for small population sizes N < 20. But we also find that larger mean firing rates and bin sizes generally decreases performance. The performance for larger populations were generally not as good. For large populations, pairwise models may be good in terms of predicting third-order correlations and the probability of multiple neurons being active, but still significantly worse than small populations in terms of their improvement over the independent model in KL-divergence. We show that these results are independent of the cortical area and of whether approximate methods or Boltzmann learning are used for inferring the pairwise couplings. We compared the scaling of the inferred couplings with N and find it to be well explained by the Sherrington-Kirkpatrick (SK) model, whose strong coupling regime shows a complex phase with many metastable states. We find that, up to the maximum population size studied here, the fitted PME model remains outside its complex phase. However, the standard deviation of the couplings compared to their mean increases, and the model gets closer to the boundary of the complex phase as the population size grows.
Collapse
Affiliation(s)
- Valdemar Kargård Olsen
- Kavli Institute for Systems Neuroscience, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Jonathan R Whitlock
- Kavli Institute for Systems Neuroscience, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Yasser Roudi
- Kavli Institute for Systems Neuroscience, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Mathematics, King's College London, London, United Kingdom
| |
Collapse
|
2
|
Sutherland CA, Prigozhin DM, Monroe JG, Krasileva KV. High allelic diversity in Arabidopsis NLRs is associated with distinct genomic features. EMBO Rep 2024; 25:2306-2322. [PMID: 38528170 PMCID: PMC11093987 DOI: 10.1038/s44319-024-00122-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/07/2024] [Accepted: 03/08/2024] [Indexed: 03/27/2024] Open
Abstract
Plants rely on Nucleotide-binding, Leucine-rich repeat Receptors (NLRs) for pathogen recognition. Highly variable NLRs (hvNLRs) show remarkable intraspecies diversity, while their low-variability paralogs (non-hvNLRs) are conserved between ecotypes. At a population level, hvNLRs provide new pathogen-recognition specificities, but the association between allelic diversity and genomic and epigenomic features has not been established. Our investigation of NLRs in Arabidopsis Col-0 has revealed that hvNLRs show higher expression, less gene body cytosine methylation, and closer proximity to transposable elements than non-hvNLRs. hvNLRs show elevated synonymous and nonsynonymous nucleotide diversity and are in chromatin states associated with an increased probability of mutation. Diversifying selection maintains variability at a subset of codons of hvNLRs, while purifying selection maintains conservation at non-hvNLRs. How these features are established and maintained, and whether they contribute to the observed diversity of hvNLRs is key to understanding the evolution of plant innate immune receptors.
Collapse
Affiliation(s)
- Chandler A Sutherland
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Daniil M Prigozhin
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - J Grey Monroe
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
3
|
Sesta L, Pagnani A, Fernandez-de-Cossio-Diaz J, Uguzzoni G. Inference of annealed protein fitness landscapes with AnnealDCA. PLoS Comput Biol 2024; 20:e1011812. [PMID: 38377054 PMCID: PMC10878520 DOI: 10.1371/journal.pcbi.1011812] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/08/2024] [Indexed: 02/22/2024] Open
Abstract
The design of proteins with specific tasks is a major challenge in molecular biology with important diagnostic and therapeutic applications. High-throughput screening methods have been developed to systematically evaluate protein activity, but only a small fraction of possible protein variants can be tested using these techniques. Computational models that explore the sequence space in-silico to identify the fittest molecules for a given function are needed to overcome this limitation. In this article, we propose AnnealDCA, a machine-learning framework to learn the protein fitness landscape from sequencing data derived from a broad range of experiments that use selection and sequencing to quantify protein activity. We demonstrate the effectiveness of our method by applying it to antibody Rep-Seq data of immunized mice and screening experiments, assessing the quality of the fitness landscape reconstructions. Our method can be applied to several experimental cases where a population of protein variants undergoes various rounds of selection and sequencing, without relying on the computation of variants enrichment ratios, and thus can be used even in cases of disjoint sequence samples.
Collapse
Affiliation(s)
- Luca Sesta
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
| | - Andrea Pagnani
- Department of Applied Science and Technology, Politecnico di Torino, Torino, Italy
- Italian Institute for Genomic Medicine, Torino, Italy
- INFN, Sezione di Torino, Torino, Italy
| | | | | |
Collapse
|
4
|
Truong PL, Yin Y, Lee D, Ko SH. Advancement in COVID-19 detection using nanomaterial-based biosensors. EXPLORATION (BEIJING, CHINA) 2023; 3:20210232. [PMID: 37323622 PMCID: PMC10191025 DOI: 10.1002/exp.20210232] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 05/11/2022] [Indexed: 06/17/2023]
Abstract
Coronavirus disease 2019 (COVID-19) pandemic has exemplified how viral growth and transmission are a significant threat to global biosecurity. The early detection and treatment of viral infections is the top priority to prevent fresh waves and control the pandemic. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been identified through several conventional molecular methodologies that are time-consuming and require high-skill labor, apparatus, and biochemical reagents but have a low detection accuracy. These bottlenecks hamper conventional methods from resolving the COVID-19 emergency. However, interdisciplinary advances in nanomaterials and biotechnology, such as nanomaterials-based biosensors, have opened new avenues for rapid and ultrasensitive detection of pathogens in the field of healthcare. Many updated nanomaterials-based biosensors, namely electrochemical, field-effect transistor, plasmonic, and colorimetric biosensors, employ nucleic acid and antigen-antibody interactions for SARS-CoV-2 detection in a highly efficient, reliable, sensitive, and rapid manner. This systematic review summarizes the mechanisms and characteristics of nanomaterials-based biosensors for SARS-CoV-2 detection. Moreover, continuing challenges and emerging trends in biosensor development are also discussed.
Collapse
Affiliation(s)
- Phuoc Loc Truong
- Laser and Thermal Engineering LabDepartment of Mechanical EngineeringGachon UniversitySeongnamKorea
| | - Yiming Yin
- New Materials InstituteDepartment of MechanicalMaterials and Manufacturing EngineeringUniversity of Nottingham Ningbo ChinaNingboChina
- Applied Nano and Thermal Science LabDepartment of Mechanical EngineeringSeoul National UniversityGwanak‐guSeoulKorea
| | - Daeho Lee
- Laser and Thermal Engineering LabDepartment of Mechanical EngineeringGachon UniversitySeongnamKorea
| | - Seung Hwan Ko
- Applied Nano and Thermal Science LabDepartment of Mechanical EngineeringSeoul National UniversityGwanak‐guSeoulKorea
- Institute of Advanced Machinery and Design (SNU‐IAMD)/Institute of Engineering ResearchSeoul National UniversityGwanak‐guSeoulKorea
| |
Collapse
|
5
|
Pennell M, Rodriguez OL, Watson CT, Greiff V. The evolutionary and functional significance of germline immunoglobulin gene variation. Trends Immunol 2023; 44:7-21. [PMID: 36470826 DOI: 10.1016/j.it.2022.11.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/04/2022]
Abstract
The recombination between immunoglobulin (IG) gene segments determines an individual's naïve antibody repertoire and, consequently, (auto)antigen recognition. Emerging evidence suggests that mammalian IG germline variation impacts humoral immune responses associated with vaccination, infection, and autoimmunity - from the molecular level of epitope specificity, up to profound changes in the architecture of antibody repertoires. These links between IG germline variants and immunophenotype raise the question on the evolutionary causes and consequences of diversity within IG loci. We discuss why the extreme diversity in IG loci remains a mystery, why resolving this is important for the design of more effective vaccines and therapeutics, and how recent evidence from multiple lines of inquiry may help us do so.
Collapse
Affiliation(s)
- Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA; Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY, USA
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
6
|
Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A. Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan. Mol Biol Evol 2021; 38:318-328. [PMID: 32770229 PMCID: PMC7783173 DOI: 10.1093/molbev/msaa204] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The recent technological advances underlying the screening of large combinatorial libraries in high-throughput mutational scans deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects, and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.
Collapse
Affiliation(s)
- Jorge Fernandez-de-Cossio-Diaz
- Systems Biology Department, Center of Molecular Immunology, Havana, Cuba.,Laboratory of Physics of the Ecole Normale Superieure, CNRS UMR 8023 & PSL Research, Paris, France
| | | | - Andrea Pagnani
- Politecnico di Torino, Torino, Italy.,Italian Institute for Genomic Medicine, IRCCS Candiolo, Candiolo, TO, Italy.,INFN, Sezione di Torino, Torino, Italy
| |
Collapse
|
7
|
Shin JE, Riesselman AJ, Kollasch AW, McMahon C, Simon E, Sander C, Manglik A, Kruse AC, Marks DS. Protein design and variant prediction using autoregressive generative models. Nat Commun 2021; 12:2403. [PMID: 33893299 PMCID: PMC8065141 DOI: 10.1038/s41467-021-22732-w] [Citation(s) in RCA: 164] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 03/26/2021] [Indexed: 12/11/2022] Open
Abstract
The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.
Collapse
Affiliation(s)
- Jung-Eun Shin
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- insitro, South San Francisco, CA, USA
| | - Aaron W Kollasch
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Conor McMahon
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA
- Vertex Pharmaceuticals, Boston, MA, USA
| | - Elana Simon
- Harvard College, Cambridge, MA, USA
- Reverie Labs, Cambridge, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Aashish Manglik
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA
- Department of Anesthesia and Perioperative Care, University of California San Francisco, San Francisco, CA, USA
| | - Andrew C Kruse
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA.
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
8
|
Pertseva M, Gao B, Neumeier D, Yermanos A, Reddy ST. Applications of Machine and Deep Learning in Adaptive Immunity. Annu Rev Chem Biomol Eng 2021; 12:39-62. [PMID: 33852352 DOI: 10.1146/annurev-chembioeng-101420-125021] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Adaptive immunity is mediated by lymphocyte B and T cells, which respectively express a vast and diverse repertoire of B cell and T cell receptors and, in conjunction with peptide antigen presentation through major histocompatibility complexes (MHCs), can recognize and respond to pathogens and diseased cells. In recent years, advances in deep sequencing have led to a massive increase in the amount of adaptive immune receptor repertoire data; additionally, proteomics techniques have led to a wealth of data on peptide-MHC presentation. These large-scale data sets are now making it possible to train machine and deep learning models, which can be used to identify complex and high-dimensional patterns in immune repertoires. This article introduces adaptive immune repertoires and machine and deep learning related to biological sequence data and then summarizes the many applications in this field, which span from predicting the immunological status of a host to the antigen specificity of individual receptors and the engineering of immunotherapeutics.
Collapse
Affiliation(s)
- Margarita Pertseva
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; .,Life Science Zurich Graduate School, ETH Zurich and University of Zurich, 8006 Zurich, Switzerland
| | - Beichen Gao
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland;
| | - Daniel Neumeier
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland;
| | - Alexander Yermanos
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland; .,Department of Pathology and Immunology, University of Geneva, 1205 Geneva, Switzerland.,Department of Biology, Institute of Microbiology and Immunology, ETH Zurich, 8093 Zurich, Switzerland
| | - Sai T Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland;
| |
Collapse
|
9
|
Karolak A, Branciamore S, McCune JS, Lee PP, Rodin AS, Rockne RC. Concepts and Applications of Information Theory to Immuno-Oncology. Trends Cancer 2021; 7:335-346. [PMID: 33618998 PMCID: PMC8156485 DOI: 10.1016/j.trecan.2020.12.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/16/2020] [Accepted: 12/18/2020] [Indexed: 01/27/2023]
Abstract
Recent successes of immune-modulating therapies for cancer have stimulated research on information flow within the immune system and, in turn, clinical applications of concepts from information theory. Through information theory, one can describe and formalize, in a mathematically rigorous fashion, the function of interconnected components of the immune system in health and disease. Specifically, using concepts including entropy, mutual information, and channel capacity, one can quantify the storage, transmission, encoding, and flow of information within and between cellular components of the immune system on multiple temporal and spatial scales. To understand, at the quantitative level, immune signaling function and dysfunction in cancer, we present a methodology-oriented review of information-theoretic treatment of biochemical signal transduction and transmission coupled with mathematical modeling.
Collapse
Affiliation(s)
- Aleksandra Karolak
- Department of Hematologic Malignancies Translational Science, Beckman Research Institute of City of Hope, Duarte, CA, USA; Division of Mathematical Oncology, Department of Computational and Quantitative Medicine, Beckman Research Institute of City of Hope, Duarte, CA, USA.
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | - Jeannine S McCune
- Department of Hematologic Malignancies Translational Science, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | - Peter P Lee
- Department of Immuno-Oncology, Beckman Research Institute of City of Hope, CA, USA
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of City of Hope, Duarte, CA, USA
| | - Russell C Rockne
- Division of Mathematical Oncology, Department of Computational and Quantitative Medicine, Beckman Research Institute of City of Hope, Duarte, CA, USA
| |
Collapse
|
10
|
Ralph DK, Matsen FA. Using B cell receptor lineage structures to predict affinity. PLoS Comput Biol 2020; 16:e1008391. [PMID: 33175831 PMCID: PMC7682889 DOI: 10.1371/journal.pcbi.1008391] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 11/23/2020] [Accepted: 08/30/2020] [Indexed: 11/18/2022] Open
Abstract
We are frequently faced with a large collection of antibodies, and want to select those with highest affinity for their cognate antigen. When developing a first-line therapeutic for a novel pathogen, for instance, we might look for such antibodies in patients that have recovered. There exist effective experimental methods of accomplishing this, such as cell sorting and baiting; however they are time consuming and expensive. Next generation sequencing of B cell receptor (BCR) repertoires offers an additional source of sequences that could be tapped if we had a reliable method of selecting those coding for the best antibodies. In this paper we introduce a method that uses evolutionary information from the family of related sequences that share a naive ancestor to predict the affinity of each resulting antibody for its antigen. When combined with information on the identity of the antigen, this method should provide a source of effective new antibodies. We also introduce a method for a related task: given an antibody of interest and its inferred ancestral lineage, which branches in the tree are likely to harbor key affinity-increasing mutations? We evaluate the performance of these methods on a wide variety of simulated samples, as well as two real data samples. These methods are implemented as part of continuing development of the partis BCR inference package, available at https://github.com/psathyrella/partis. Comments Please post comments or questions on this paper as new issues at https://git.io/Jvxkn.
Collapse
Affiliation(s)
- Duncan K. Ralph
- Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | | |
Collapse
|
11
|
Prechl J. Network Organization of Antibody Interactions in Sequence and Structure Space: the RADARS Model. Antibodies (Basel) 2020; 9:antib9020013. [PMID: 32384800 PMCID: PMC7345901 DOI: 10.3390/antib9020013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/09/2020] [Accepted: 04/15/2020] [Indexed: 02/06/2023] Open
Abstract
Adaptive immunity in vertebrates is a complex self-organizing network of molecular interactions. While deep sequencing of the immune-receptor repertoire may reveal clonal relationships, functional interpretation of such data is hampered by the inherent limitations of converting sequence to structure to function. In this paper, a novel model of antibody interaction space and network, termed radial adjustment of system resolution, RAdial ADjustment of System Resolution (RADARS), is proposed. The model is based on the radial growth of interaction affinity of antibodies towards an infinity of directions in structure space, each direction corresponding to particular shapes of antigen epitopes. Levels of interaction affinity appear as free energy shells of the system, where hierarchical B-cell development and differentiation takes place. Equilibrium in this immunological thermodynamic system can be described by a power law distribution of antibody-free energies with an ideal network degree exponent of phi square, representing a scale-free fractal network of antibody interactions. Plasma cells are network hubs, memory B cells are nodes with intermediate degrees, and B1 cells function as nodes with minimal degree. Overall, the RADARS model implies that a finite number of antibody structures can interact with an infinite number of antigens by immunologically controlled adjustment of interaction energy distribution. Understanding quantitative network properties of the system should help the organization of sequence-derived predicted structural data.
Collapse
Affiliation(s)
- József Prechl
- Diagnosticum Zrt., 126. Attila u., 1047 Budapest, Hungary
| |
Collapse
|
12
|
Facco E, Pagnani A, Russo ET, Laio A. The intrinsic dimension of protein sequence evolution. PLoS Comput Biol 2019; 15:e1006767. [PMID: 30958823 PMCID: PMC6472826 DOI: 10.1371/journal.pcbi.1006767] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 04/18/2019] [Accepted: 12/25/2018] [Indexed: 01/22/2023] Open
Abstract
It is well known that, in order to preserve its structure and function, a protein cannot change its sequence at random, but only by mutations occurring preferentially at specific locations. We here investigate quantitatively the amount of variability that is allowed in protein sequence evolution, by computing the intrinsic dimension (ID) of the sequences belonging to a selection of protein families. The ID is a measure of the number of independent directions that evolution can take starting from a given sequence. We find that the ID is practically constant for sequences belonging to the same family, and moreover it is very similar in different families, with values ranging between 6 and 12. These values are significantly smaller than the raw number of amino acids, confirming the importance of correlations between mutations in different sites. However, we demonstrate that correlations are not sufficient to explain the small value of the ID we observe in protein families. Indeed, we show that the ID of a set of protein sequences generated by maximum entropy models, an approach in which correlations are accounted for, is typically significantly larger than the value observed in natural protein families. We further prove that a critical factor to reproduce the natural ID is to take into consideration the phylogeny of sequences.
Collapse
Affiliation(s)
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Torino, Italy
- IIGM, Italian Institute for Genomic Medicine, Torino, Italy
- INFN, Sezione di Torino, Torino, Italy
| | | | - Alessandro Laio
- SISSA, Trieste, Italy
- ICTP, International Centre for Theoretical Physics, Trieste, Italy
| |
Collapse
|
13
|
Adams RM, Kinney JB, Walczak AM, Mora T. Epistasis in a Fitness Landscape Defined by Antibody-Antigen Binding Free Energy. Cell Syst 2019; 8:86-93.e3. [PMID: 30611676 PMCID: PMC6487650 DOI: 10.1016/j.cels.2018.12.004] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 10/12/2018] [Accepted: 12/07/2018] [Indexed: 12/16/2022]
Abstract
Epistasis is the phenomenon by which the effect of a mutation depends on its genetic background. While it is usually defined in terms of organismal fitness, for single proteins, it must reflect physical interactions among residues. Here, we systematically extract the specific contribution pairwise epistasis makes to the physical affinity of antibody-antigen binding relevant to affinity maturation, a process of accelerated Darwinian evolution. We find that, among competing definitions of affinity, the binding free energy is the most appropriate to describe epistasis. We show that epistasis is pervasive, accounting for 25%-35% of variability, of which a large fraction is beneficial. This work suggests that epistasis both constrains, through negative epistasis, and enlarges, through positive epistasis, the set of possible evolutionary paths that can produce high-affinity sequences during repeated rounds of mutation and selection.
Collapse
Affiliation(s)
- Rhys M Adams
- CNRS, Laboratoire de Physique Théorique, UPMC (Sorbonne University), and École Normale Supérieure (PSL), 24 rue Lhomond, Paris 75005, France; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Rd., Cold Spring Harbor, NY 11724, USA
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, 1 Bungtown Rd., Cold Spring Harbor, NY 11724, USA
| | - Aleksandra M Walczak
- CNRS, Laboratoire de Physique Théorique, UPMC (Sorbonne University), and École Normale Supérieure (PSL), 24 rue Lhomond, Paris 75005, France.
| | - Thierry Mora
- CNRS, Laboratoire de Physique Statistique, UPMC (Sorbonne University), Paris-Diderot University, and École Normale Supérieure (PSL), 24, rue Lhomond, Paris 75005, France.
| |
Collapse
|
14
|
Clavero-Álvarez A, Di Mambro T, Perez-Gaviro S, Magnani M, Bruscolini P. Humanization of Antibodies using a Statistical Inference Approach. Sci Rep 2018; 8:14820. [PMID: 30287940 PMCID: PMC6172228 DOI: 10.1038/s41598-018-32986-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 09/19/2018] [Indexed: 02/08/2023] Open
Abstract
Antibody humanization is a key step in the preclinical phase of the development of therapeutic antibodies, originally developed and tested in non-human models (most typically, in mouse). The standard technique of Complementarity-Determining Regions (CDR) grafting into human Framework Regions of germline sequences has some important drawbacks, in that the resulting sequences often need further back-mutations to ensure functionality and/or stability. Here we propose a new method to characterize the statistical distribution of the sequences of the variable regions of human antibodies, that takes into account phenotypical correlations between pairs of residues, both within and between chains. We define a "humanness score" of a sequence, comparing its performance in distinguishing human from murine sequences, with that of some alternative scores in the literature. We also compare the score with the experimental immunogenicity of clinically used antibodies. Finally, we use the humanness score as an optimization function and perform a search in the sequence space, starting from different murine sequences and keeping the CDR regions unchanged. Our results show that our humanness score outperforms other methods in sequence classification, and the optimization protocol is able to generate humanized sequences that are recognized as human by standard homology modelling tools.
Collapse
Affiliation(s)
| | - Tomas Di Mambro
- Department of Biomolecular Sciences, University of Urbino "Carlo Bo", Urbino, Italy
| | - Sergio Perez-Gaviro
- Departamento de Física Teórica, Universidad de Zaragoza, Zaragoza, 50009, Spain.,Centro Universitario de la Defensa, Zaragoza, 50090, Spain.,Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, 50018, Spain
| | - Mauro Magnani
- Department of Biomolecular Sciences, University of Urbino "Carlo Bo", Urbino, Italy
| | - Pierpaolo Bruscolini
- Departamento de Física Teórica, Universidad de Zaragoza, Zaragoza, 50009, Spain. .,Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, 50018, Spain.
| |
Collapse
|
15
|
Abstract
Probabilistic modeling is fundamental to the statistical analysis of complex data. In addition to forming a coherent description of the data-generating process, probabilistic models enable parameter inference about given datasets. This procedure is well developed in the Bayesian perspective, in which one infers probability distributions describing to what extent various possible parameters agree with the data. In this paper, we motivate and review probabilistic modeling for adaptive immune receptor repertoire data then describe progress and prospects for future work, from germline haplotyping to adaptive immune system deployment across tissues. The relevant quantities in immune sequence analysis include not only continuous parameters such as gene use frequency but also discrete objects such as B-cell clusters and lineages. Throughout this review, we unravel the many opportunities for probabilistic modeling in adaptive immune receptor analysis, including settings for which the Bayesian approach holds substantial promise (especially if one is optimistic about new computational methods). From our perspective, the greatest prospects for progress in probabilistic modeling for repertoires concern ancestral sequence estimation for B-cell receptor lineages, including uncertainty from germline genotype, rearrangement, and lineage development.
Collapse
Affiliation(s)
- Branden Olson
- Computational Biology Program Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: M1-B514 Seattle, WA 98109-1024 phone: +1 206 667 7318
| | - Frederick A. Matsen
- Computational Biology Program Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: M1-B514 Seattle, WA 98109-1024 phone: +1 206 667 7318
| |
Collapse
|
16
|
Dib L, Salamin N, Gfeller D. Polymorphic sites preferentially avoid co-evolving residues in MHC class I proteins. PLoS Comput Biol 2018; 14:e1006188. [PMID: 29782520 PMCID: PMC5983860 DOI: 10.1371/journal.pcbi.1006188] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 06/01/2018] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
Major histocompatibility complex class I (MHC-I) molecules are critical to adaptive immune defence mechanisms in vertebrate species and are encoded by highly polymorphic genes. Polymorphic sites are located close to the ligand-binding groove and entail MHC-I alleles with distinct binding specificities. Some efforts have been made to investigate the relationship between polymorphism and protein stability. However, less is known about the relationship between polymorphism and MHC-I co-evolutionary constraints. Using Direct Coupling Analysis (DCA) we found that co-evolution analysis accurately pinpoints structural contacts, although the protein family is restricted to vertebrates and comprises less than five hundred species, and that the co-evolutionary signal is mainly driven by inter-species changes, and not intra-species polymorphism. Moreover, we show that polymorphic sites in human preferentially avoid co-evolving residues, as well as residues involved in protein stability. These results suggest that sites displaying high polymorphism may have been selected during vertebrates’ evolution to avoid co-evolutionary constraints and thereby maximize their mutability. Amino acid co-evolution represents cases of simultaneous substitution of amino acids at distinct positions in protein sequences. In the MHC-I protein family, such co-evolution could result from either amino acid changes across species or changes within species due to the high polymorphism of MHC-I molecules. Here we show that signals captured by global methods such as Direct Coupling Analysis (DCA) to estimate co-evolution primarily result from changes across species. Moreover, our results indicate that polymorphic sites in MHC-I molecules tend to be decoupled from co-evolving ones. This could suggest that they have been selected to maximize their mutability, which is known to be functionally important to entail MHC-I molecules with a wide repertoire of binding specificities for antigen presentation.
Collapse
Affiliation(s)
- Linda Dib
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Switzerland
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
| | - Nicolas Salamin
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Switzerland
- Swiss Institutes of Bioinformatics, Quartier Sorge, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
17
|
Abstract
The immune systems protect our bodies from foreign molecules or antigens, where antibodies play important roles. Antibodies evolve over time upon antigen encounter by somatically mutating their genome sequences. The end result is a series of antibodies that display higher affinities and specificities to specific antigens. This process is called affinity maturation. Recent improvements in computer hardware and modeling algorithms now enable the rational design of protein structures and functions, and several works on computer-aided antibody design have been published. In this chapter, we briefly describe computational methods for antibody affinity maturation, focusing on methods for sampling antibody conformations and for scoring designed antibody variants. We also discuss lessons learned from the successful computer-aided design of antibodies.
Collapse
Affiliation(s)
- Daisuke Kuroda
- Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo, Japan
| | - Kouhei Tsumoto
- Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo, Japan.
- Medical Proteomics Laboratory, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
18
|
Wang L, Whittemore K, Johnston SA, Stafford P. Entropy is a Simple Measure of the Antibody Profile and is an Indicator of Health Status: A Proof of Concept. Sci Rep 2017; 7:18060. [PMID: 29273777 PMCID: PMC5741721 DOI: 10.1038/s41598-017-18469-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 12/12/2017] [Indexed: 01/30/2023] Open
Abstract
We have previously shown that the diversity of antibodies in an individual can be displayed on chips on which 130,000 peptides chosen from random sequence space have been synthesized. This immunosignature technology is unbiased in displaying antibody diversity relative to natural sequence space, and has been shown to have diagnostic and prognostic potential for a wide variety of diseases and vaccines. Here we show that a global measure such as Shannon's entropy can be calculated for each immunosignature. The immune entropy was measured across a diverse set of 800 people and in 5 individuals over 3 months. The immune entropy is affected by some population characteristics and varies widely across individuals. We find that people with infections or breast cancer, generally have higher entropy values than non-diseased individuals. We propose that the immune entropy as measured from immunosignatures may be a simple method to monitor health in individuals and populations.
Collapse
Affiliation(s)
- Lu Wang
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, Tempe, AZ, 85287, United States
| | - Kurt Whittemore
- Centro Nacional de Investigaciones Oncologicas, Madrid, 28029, Spain
| | - Stephen Albert Johnston
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, Tempe, AZ, 85287, United States
| | - Phillip Stafford
- Center for Innovations in Medicine, Biodesign Institute, Arizona State University, Tempe, AZ, 85287, United States.
| |
Collapse
|
19
|
Barrat-Charlaix P, Figliuzzi M, Weigt M. Improving landscape inference by integrating heterogeneous data in the inverse Ising problem. Sci Rep 2016; 6:37812. [PMID: 27886273 PMCID: PMC5122905 DOI: 10.1038/srep37812] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open
Abstract
The inverse Ising problem and its generalizations to Potts and continuous spin models have recently attracted much attention thanks to their successful applications in the statistical modeling of biological data. In the standard setting, the parameters of an Ising model (couplings and fields) are inferred using a sample of equilibrium configurations drawn from the Boltzmann distribution. However, in the context of biological applications, quantitative information for a limited number of microscopic spins configurations has recently become available. In this paper, we extend the usual setting of the inverse Ising model by developing an integrative approach combining the equilibrium sample with (possibly noisy) measurements of the energy performed for a number of arbitrary configurations. Using simulated data, we show that our integrative approach outperforms standard inference based only on the equilibrium sample or the energy measurements, including error correction of noisy energy measurements. As a biological proof-of-concept application, we show that mutational fitness landscapes in proteins can be better described when combining evolutionary sequence data with complementary structural information about mutant sequences.
Collapse
Affiliation(s)
- Pierre Barrat-Charlaix
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie computationnelle et quantitative - Institut de Biologie Paris Seine, 75005 Paris, France
| | - Matteo Figliuzzi
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie computationnelle et quantitative - Institut de Biologie Paris Seine, 75005 Paris, France.,Sorbonne Universités, UPMC Univ Paris 06, Institut de Calcul et de la Simulation, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie computationnelle et quantitative - Institut de Biologie Paris Seine, 75005 Paris, France
| |
Collapse
|