1
|
Case M, Smith M, Vinh J, Thurber G. Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space. Proc Natl Acad Sci U S A 2024; 121:e2311726121. [PMID: 38451939 PMCID: PMC10945751 DOI: 10.1073/pnas.2311726121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 12/27/2023] [Indexed: 03/09/2024] Open
Abstract
Proteins are a diverse class of biomolecules responsible for wide-ranging cellular functions, from catalyzing reactions to recognizing pathogens. The ability to evolve proteins rapidly and inexpensively toward improved properties is a common objective for protein engineers. Powerful high-throughput methods like fluorescent activated cell sorting and next-generation sequencing have dramatically improved directed evolution experiments. However, it is unclear how to best leverage these data to characterize protein fitness landscapes more completely and identify lead candidates. In this work, we develop a simple yet powerful framework to improve protein optimization by predicting continuous protein properties from simple directed evolution experiments using interpretable, linear machine learning models. Importantly, we find that these models, which use data from simple but imprecise experimental estimates of protein fitness, have predictive capabilities that approach more precise but expensive data. Evaluated across five diverse protein engineering tasks, continuous properties are consistently predicted from readily available deep sequencing data, demonstrating that protein fitness space can be reasonably well modeled by linear relationships among sequence mutations. To prospectively test the utility of this approach, we generated a library of stapled peptides and applied the framework to predict affinity and specificity from simple cell sorting data. We then coupled integer linear programming, a method to optimize protein fitness from linear weights, with mutation scores from machine learning to identify variants in unseen sequence space that have improved and co-optimal properties. This approach represents a versatile tool for improved analysis and identification of protein variants across many domains of protein engineering.
Collapse
Affiliation(s)
- Marshall Case
- Chemical Engineering, University of Michigan, Ann Arbor, MI48109
| | - Matthew Smith
- Chemical Engineering, University of Michigan, Ann Arbor, MI48109
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI48109
| | - Jordan Vinh
- Biomedical Engineering, University of Michigan, Ann Arbor, MI48109
| | - Greg Thurber
- Chemical Engineering, University of Michigan, Ann Arbor, MI48109
- Biomedical Engineering, University of Michigan, Ann Arbor, MI48109
| |
Collapse
|
2
|
Smith MD, Case MA, Makowski EK, Tessier PM. Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data. Bioinformatics 2023; 39:btad446. [PMID: 37478351 PMCID: PMC10477941 DOI: 10.1093/bioinformatics/btad446] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 06/21/2023] [Accepted: 07/20/2023] [Indexed: 07/23/2023] Open
Abstract
MOTIVATION Deep sequencing of antibody and related protein libraries after phage or yeast-surface display sorting is widely used to identify variants with increased affinity, specificity, and/or improvements in key biophysical properties. Conventional approaches for identifying optimal variants typically use the frequencies of observation in enriched libraries or the corresponding enrichment ratios. However, these approaches disregard the vast majority of deep sequencing data and often fail to identify the best variants in the libraries. RESULTS Here, we present a method, Position-Specific Enrichment Ratio Matrix (PSERM) scoring, that uses entire deep sequencing datasets from pre- and post-selections to score each observed protein variant. The PSERM scores are the sum of the site-specific enrichment ratios observed at each mutated position. We find that PSERM scores are much more reproducible and correlate more strongly with experimentally measured properties than frequencies or enrichment ratios, including for multiple antibody properties (affinity and non-specific binding) for a clinical-stage antibody (emibetuzumab). We expect that this method will be broadly applicable to diverse protein engineering campaigns. AVAILABILITY AND IMPLEMENTATION All deep sequencing datasets and code to perform the analyses presented within are available via https://github.com/Tessier-Lab-UMich/PSERM_paper.
Collapse
Affiliation(s)
- Matthew D Smith
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109-2200, United States
| | - Marshall A Case
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
| | - Emily K Makowski
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI 48109-2200, United States
| | - Peter M Tessier
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Protein Folding Disease Initiative, University of Michigan, Ann Arbor, MI 48109-2200, United States
- Michigan Alzheimer’s Disease Center, University of Michigan, Ann Arbor, MI 48109-2200, United States
| |
Collapse
|
3
|
Smith MD, Case MA, Makowski EK, Tessier PM. Position-Specific Enrichment Ratio Matrix scores predict antibody variant properties from deep sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548448. [PMID: 37503142 PMCID: PMC10369870 DOI: 10.1101/2023.07.10.548448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Motivation Deep sequencing of antibody and related protein libraries after phage or yeast-surface display sorting is widely used to identify variants with increased affinity, specificity and/or improvements in key biophysical properties. Conventional approaches for identifying optimal variants typically use the frequencies of observation in enriched libraries or the corresponding enrichment ratios. However, these approaches disregard the vast majority of deep sequencing data and often fail to identify the best variants in the libraries. Results Here, we present a method, Position-Specific Enrichment Ratio Matrix (PSERM) scoring, that uses entire deep sequencing datasets from pre- and post-selections to score each observed protein variant. The PSERM scores are the sum of the site-specific enrichment ratios observed at each mutated position. We find that PSERM scores are much more reproducible and correlate more strongly with experimentally measured properties than frequencies or enrichment ratios, including for multiple antibody properties (affinity and non-specific binding) for a clinical-stage antibody (emibetuzumab). We expect that this method will be broadly applicable to diverse protein engineering campaigns. Availability All deep sequencing datasets and code to do the analyses presented within are available via GitHub. Contact Peter Tessier, ptessier@umich.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
4
|
França RKADO, Silva JM, Rodrigues LS, Sokolowskei D, Brigido MM, Maranhão AQ. New Anti-Flavivirus Fusion Loop Human Antibodies with Zika Virus-Neutralizing Potential. Int J Mol Sci 2022; 23:ijms23147805. [PMID: 35887153 PMCID: PMC9321016 DOI: 10.3390/ijms23147805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/05/2022] [Accepted: 07/11/2022] [Indexed: 02/04/2023] Open
Abstract
Zika virus infections exhibit recurrent outbreaks and can be responsible for disease complications such as congenital Zika virus syndrome. Effective therapeutic interventions are still a challenge. Antibodies can provide significant protection, although the antibody response may fail due to antibody-dependent enhancement reactions. The choice of the target antigen is a crucial part of the process to generate effective neutralizing antibodies. Human anti-Zika virus antibodies were selected by phage display technology. The antibodies were selected against a mimetic peptide based on the fusion loop region in the protein E of Zika virus, which is highly conserved among different flaviviruses. Four rounds of selection were performed using the synthetic peptide in two strategies: the first was using the acidic elution of bound phages, and the second was by applying a competing procedure. After panning, the selected VH and VL domains were determined by combining NGS and bioinformatic approaches. Three different human monoclonal antibodies were expressed as scFvs and further characterized. All showed a binding capacity to Zika (ZIKV) and showed cross-recognition with yellow fever (YFV) and dengue (DENV) viruses. Two of these antibodies, AZ1p and AZ6m, could neutralize the ZIKV infection in vitro. Due to the conservation of the fusion loop region, these new antibodies can potentially be used in therapeutic intervention against Zika virus and other flavivirus illnesses.
Collapse
Affiliation(s)
- Renato Kaylan Alves de Oliveira França
- Molecular Immunology Laboratory, Department of Cellular Biology, Institute of Biological Sciences, University of Brasilia, Brasilia 70910-900, Brazil; (R.K.A.d.O.F.); (J.M.S.); (L.S.R.); (D.S.); (A.Q.M.)
- Graduation Program in Molecular Pathology, University of Brasilia, Brasilia 70910-900, Brazil
| | - Jacyelle Medeiros Silva
- Molecular Immunology Laboratory, Department of Cellular Biology, Institute of Biological Sciences, University of Brasilia, Brasilia 70910-900, Brazil; (R.K.A.d.O.F.); (J.M.S.); (L.S.R.); (D.S.); (A.Q.M.)
| | - Lucas Silva Rodrigues
- Molecular Immunology Laboratory, Department of Cellular Biology, Institute of Biological Sciences, University of Brasilia, Brasilia 70910-900, Brazil; (R.K.A.d.O.F.); (J.M.S.); (L.S.R.); (D.S.); (A.Q.M.)
- Graduation Program in Molecular Pathology, University of Brasilia, Brasilia 70910-900, Brazil
| | - Dimitri Sokolowskei
- Molecular Immunology Laboratory, Department of Cellular Biology, Institute of Biological Sciences, University of Brasilia, Brasilia 70910-900, Brazil; (R.K.A.d.O.F.); (J.M.S.); (L.S.R.); (D.S.); (A.Q.M.)
- Graduation Program in Molecular Biology, University of Brasilia, Brasilia 70910-900, Brazil
| | - Marcelo Macedo Brigido
- Molecular Immunology Laboratory, Department of Cellular Biology, Institute of Biological Sciences, University of Brasilia, Brasilia 70910-900, Brazil; (R.K.A.d.O.F.); (J.M.S.); (L.S.R.); (D.S.); (A.Q.M.)
- Graduation Program in Molecular Pathology, University of Brasilia, Brasilia 70910-900, Brazil
- Graduation Program in Molecular Biology, University of Brasilia, Brasilia 70910-900, Brazil
- III-Immunology Investigation Institute–CNPq-MCT, São Paulo 05403-000, Brazil
- Correspondence:
| | - Andrea Queiroz Maranhão
- Molecular Immunology Laboratory, Department of Cellular Biology, Institute of Biological Sciences, University of Brasilia, Brasilia 70910-900, Brazil; (R.K.A.d.O.F.); (J.M.S.); (L.S.R.); (D.S.); (A.Q.M.)
- Graduation Program in Molecular Pathology, University of Brasilia, Brasilia 70910-900, Brazil
- Graduation Program in Molecular Biology, University of Brasilia, Brasilia 70910-900, Brazil
- III-Immunology Investigation Institute–CNPq-MCT, São Paulo 05403-000, Brazil
| |
Collapse
|