1
|
Nammi B, Jayasinghe-Arachchige VM, Madugula SS, Artiles M, Radler CN, Pham T, Liu J, Wang S. CasGen: A Regularized Generative Model for CRISPR Cas Protein Design with Classification and Margin-Based Optimization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.28.640911. [PMID: 40060553 PMCID: PMC11888460 DOI: 10.1101/2025.02.28.640911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/18/2025]
Abstract
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated proteins (Cas) systems have revolutionized genome editing by providing high precision and versatility. However, most genome editing applications rely on a limited number of well-characterized Cas9 and Cas12 variants, constraining the potential for broader genome engineering applications. In this study, we extensively explored Cas9 and Cas12 proteins and developed CasGen, a novel transformer-based deep generative model with margin-based latent space regularization to enhance the quality of newly generative Cas9 and Cas12 proteins. Specifically, CasGen employs a strategies that combine classification to filter out non-Cas sequences, Bayesian optimization of the latent space to guide functionally relevant designs, and thorough structural validation using AlphaFold-based analyses to ensure robust protein generation. We collected a comprehensive dataset with 3,021 Cas9, 597 Cas12, and 597 Non-Cas protein sequences from reputable biological databases such as InterPro and PDB. To validate the generated proteins, we performed sequence alignment using the BLAST tool to ensure novelty and filter out highly similar sequences to existing Cas proteins. Structural prediction using AlphaFold2 and AlphaFold3 confirmed that the generated proteins exhibit high structural similarity to known Cas9 and Cas12 variants, with TM-scores between 0.70 and 0.85 and root-mean-square deviation (RMSD) values below 2.00 Å. Sequence identity analysis further demonstrated that the generated Cas9 orthologs exhibited 28% to 55% identity with known variants, while Cas12a variants show up to 48% identity. Our results demonstrate that the proposed Cas generative model has significant potential to expand the genome editing toolkit by designing diverse Cas proteins that retain functional integrity. The developed deep generative approach offers a promising avenue for synthetic biology and therapeutic applications, enableling the development of more precise and versatile Cas-based genome editing tools.
Collapse
Affiliation(s)
- Bharani Nammi
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Vindi M. Jayasinghe-Arachchige
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Sita Sirisha Madugula
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Maria Artiles
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Charlene Norgan Radler
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Tyler Pham
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Jin Liu
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Shouyi Wang
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| |
Collapse
|
2
|
Nayfach S, Bhatnagar A, Novichkov A, Estevam GO, Kim N, Hill E, Ruffolo JA, Silverstein R, Gallagher J, Kleinstiver B, Meeske AJ, Cameron P, Madani A. Engineering of CRISPR-Cas PAM recognition using deep learning of vast evolutionary data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.06.631536. [PMID: 39829748 PMCID: PMC11741284 DOI: 10.1101/2025.01.06.631536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
CRISPR-Cas enzymes must recognize a protospacer-adjacent motif (PAM) to edit a genomic site, significantly limiting the range of targetable sequences in a genome. Machine learning-based protein engineering provides a powerful solution to efficiently generate Cas protein variants tailored to recognize specific PAMs. Here, we present Protein2PAM, an evolution-informed deep learning model trained on a dataset of over 45,000 CRISPR-Cas PAMs. Protein2PAM rapidly and accurately predicts PAM specificity directly from Cas proteins across Type I, II, and V CRISPR-Cas systems. Using in silico deep mutational scanning, we demonstrate that the model can identify residues critical for PAM recognition in Cas9 without utilizing structural information. As a proof of concept for protein engineering, we employ Protein2PAM to computationally evolve Nme1Cas9, generating variants with broadened PAM recognition and up to a 50-fold increase in PAM cleavage rates compared to the wild-type under in vitro conditions. This work represents the first successful application of machine learning to achieve customization of Cas enzymes for alternate PAM recognition, paving the way for personalized genome editing.
Collapse
Affiliation(s)
| | | | | | | | - Nahye Kim
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | | | | | - Rachel Silverstein
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Biological and Biomedical Sciences Program, Harvard University, Boston, MA, USA
| | | | - Benjamin Kleinstiver
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Alexander J. Meeske
- Profluent Bio, Berkeley, CA, USA
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | | | | |
Collapse
|
3
|
Li Q, Hu Z, Wang Y, Li L, Fan Y, King I, Jia G, Wang S, Song L, Li Y. Progress and opportunities of foundation models in bioinformatics. Brief Bioinform 2024; 25:bbae548. [PMID: 39461902 PMCID: PMC11512649 DOI: 10.1093/bib/bbae548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/20/2024] [Accepted: 10/12/2024] [Indexed: 10/29/2024] Open
Abstract
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics such as limited annotated data and data noise. These AI techniques have demonstrated remarkable efficacy across various downstream validation tasks, effectively representing diverse biological entities and heralding a new era in computational biology. The primary goal of this survey is to conduct a general investigation and summary of FMs in bioinformatics, tracing their evolutionary trajectory, current research landscape, and methodological frameworks. Our primary focus is on elucidating the application of FMs to specific biological problems, offering insights to guide the research community in choosing appropriate FMs for tasks like sequence analysis, structure prediction, and function annotation. Each section delves into the intricacies of the targeted challenges, contrasting the architectures and advancements of FMs with conventional methods and showcasing their utility across different biological domains. Further, this review scrutinizes the hurdles and constraints encountered by FMs in biology, including issues of data noise, model interpretability, and potential biases. This analysis provides a theoretical groundwork for understanding the circumstances under which certain FMs may exhibit suboptimal performance. Lastly, we outline prospective pathways and methodologies for the future development of FMs in biological research, facilitating ongoing innovation in the field. This comprehensive examination not only serves as an academic reference but also as a roadmap for forthcoming explorations and applications of FMs in biology.
Collapse
Affiliation(s)
- Qing Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Yixuan Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Lei Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Yimin Fan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Gengjie Jia
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, 518120, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China
- Shenzhen Institute of Advanced Technology, Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen, Guangdong, 518055, China
| | - Le Song
- BioMap, Zhongguancun Life Science Park, Haidian District, Beijing, 100085, China
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| |
Collapse
|
4
|
Pedrazzoli E, Demozzi M, Visentin E, Ciciani M, Bonuzzi I, Pezzè L, Lucchetta L, Maule G, Amistadi S, Esposito F, Lupo M, Miccio A, Auricchio A, Casini A, Segata N, Cereseto A. CoCas9 is a compact nuclease from the human microbiome for efficient and precise genome editing. Nat Commun 2024; 15:3478. [PMID: 38658578 PMCID: PMC11043407 DOI: 10.1038/s41467-024-47800-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 04/11/2024] [Indexed: 04/26/2024] Open
Abstract
The expansion of the CRISPR-Cas toolbox is highly needed to accelerate the development of therapies for genetic diseases. Here, through the interrogation of a massively expanded repository of metagenome-assembled genomes, mostly from human microbiomes, we uncover a large variety (n = 17,173) of type II CRISPR-Cas loci. Among these we identify CoCas9, a strongly active and high-fidelity nuclease with reduced molecular size (1004 amino acids) isolated from an uncultivated Collinsella species. CoCas9 is efficiently co-delivered with its sgRNA through adeno associated viral (AAV) vectors, obtaining efficient in vivo editing in the mouse retina. With this study we uncover a collection of previously uncharacterized Cas9 nucleases, including CoCas9, which enriches the genome editing toolbox.
Collapse
Affiliation(s)
- Eleonora Pedrazzoli
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | - Michele Demozzi
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | - Elisabetta Visentin
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | - Matteo Ciciani
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | - Ilaria Bonuzzi
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | | | - Lorenzo Lucchetta
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | - Giulia Maule
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
| | - Simone Amistadi
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy
- Université de Paris, Imagine Institute, Laboratory of chromatin and gene regulation during development, INSERM, UMR 1163, Paris, France
| | - Federica Esposito
- Telethon Institute of Genetics and Medicine (TIGEM), 80078, Pozzuoli (NA), Italy
| | - Mariangela Lupo
- Telethon Institute of Genetics and Medicine (TIGEM), 80078, Pozzuoli (NA), Italy
| | - Annarita Miccio
- Université de Paris, Imagine Institute, Laboratory of chromatin and gene regulation during development, INSERM, UMR 1163, Paris, France
| | - Alberto Auricchio
- Telethon Institute of Genetics and Medicine (TIGEM), 80078, Pozzuoli (NA), Italy
- Medical Genetics, Department of Advanced Biomedical Sciences, University of Naples "Federico II", 80131, Naples, Italy
| | | | - Nicola Segata
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy.
| | - Anna Cereseto
- Department of Computational, Cellular and Integrative Biology (CIBIO), University of Trento, 38123, Trento, Italy.
| |
Collapse
|
5
|
Ruta GV, Ciciani M, Kheir E, Gentile MD, Amistadi S, Casini A, Cereseto A. Eukaryotic-driven directed evolution of Cas9 nucleases. Genome Biol 2024; 25:79. [PMID: 38528620 PMCID: PMC10962177 DOI: 10.1186/s13059-024-03215-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 03/13/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND Further advancement of genome editing highly depends on the development of tools with higher compatibility with eukaryotes. A multitude of described Cas9s have great potential but require optimization for genome editing purposes. Among these, the Cas9 from Campylobacter jejuni, CjCas9, has a favorable small size, facilitating delivery in mammalian cells. Nonetheless, its full exploitation is limited by its poor editing activity. RESULTS Here, we develop a Eukaryotic Platform to Improve Cas Activity (EPICA) to steer weakly active Cas9 nucleases into highly active enzymes by directed evolution. The EPICA platform is obtained by coupling Cas nuclease activity with yeast auxotrophic selection followed by mammalian cell selection through a sensitive reporter system. EPICA is validated with CjCas9, generating an enhanced variant, UltraCjCas9, following directed evolution rounds. UltraCjCas9 is up to 12-fold more active in mammalian endogenous genomic loci, while preserving high genome-wide specificity. CONCLUSIONS We report a eukaryotic pipeline allowing enhancement of Cas9 systems, setting the ground to unlock the multitude of RNA-guided nucleases existing in nature.
Collapse
Affiliation(s)
- Giulia Vittoria Ruta
- Laboratory of Molecular Virology, Department CIBIO, University of Trento, Trento, Italy.
| | - Matteo Ciciani
- Laboratory of Molecular Virology, Department CIBIO, University of Trento, Trento, Italy
- Laboratory of Computational Metagenomics, Department CIBIO, University of Trento, Trento, Italy
| | - Eyemen Kheir
- Laboratory of Molecular Virology, Department CIBIO, University of Trento, Trento, Italy
| | | | - Simone Amistadi
- Laboratory of Molecular Virology, Department CIBIO, University of Trento, Trento, Italy
- Present address: Laboratory of Chromatin and Gene Regulation During Development, Université de Paris, Imagine Institute, INSERM UMR 1163, Paris, France
| | | | - Anna Cereseto
- Laboratory of Molecular Virology, Department CIBIO, University of Trento, Trento, Italy.
| |
Collapse
|
6
|
Yin Y, Wen J, Wen M, Fu X, Ke G, Zhang XB. The design strategies for CRISPR-based biosensing: Target recognition, signal conversion, and signal amplification. Biosens Bioelectron 2024; 246:115839. [PMID: 38042054 DOI: 10.1016/j.bios.2023.115839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/27/2023] [Accepted: 11/11/2023] [Indexed: 12/04/2023]
Abstract
Rapid, sensitive and selective biosensing is highly important for analyzing biological targets and dynamic physiological processes in cells and living organisms. As an emerging tool, clustered regularly interspaced short palindromic repeats (CRISPR) system is featured with excellent complementary-dependent cleavage and efficient trans-cleavage ability. These merits enable CRISPR system to improve the specificity, sensitivity, and speed for molecular detection. Herein, the structures and functions of several CRISPR proteins for biosensing are summarized in depth. Moreover, the strategies of target recognition, signal conversion, and signal amplification for CRISPR-based biosensing were highlighted from the perspective of biosensor design principles. The state-of-art applications and recent advances of CRISPR system are then outlined, with emphasis on their fluorescent, electrochemical, colorimetric, and applications in POCT technology. Finally, the current challenges and future prospects of this frontier research area are discussed.
Collapse
Affiliation(s)
- Yao Yin
- State Key Laboratory for Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - Jialin Wen
- State Key Laboratory for Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - Mei Wen
- State Key Laboratory for Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China.
| | - Xiaoyi Fu
- Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang, 310022, China.
| | - Guoliang Ke
- State Key Laboratory for Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China.
| | - Xiao-Bing Zhang
- State Key Laboratory for Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China.
| |
Collapse
|
7
|
Pedrazzoli E, Bianchi A, Umbach A, Amistadi S, Brusson M, Frati G, Ciciani M, Badowska KA, Arosio D, Miccio A, Cereseto A, Casini A. An optimized SpCas9 high-fidelity variant for direct protein delivery. Mol Ther 2023; 31:2257-2265. [PMID: 36905119 PMCID: PMC10362380 DOI: 10.1016/j.ymthe.2023.03.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/08/2023] [Accepted: 03/07/2023] [Indexed: 03/12/2023] Open
Abstract
Electroporation of the Cas9 ribonucleoprotein (RNP) complex offers the advantage of preventing off-target cleavages and potential immune responses produced by long-term expression of the nuclease. Nevertheless, the majority of engineered high-fidelity Streptococcus pyogenes Cas9 (SpCas9) variants are less active than the wild-type enzyme and are not compatible with RNP delivery. Building on our previous studies on evoCas9, we developed a high-fidelity SpCas9 variant suitable for RNP delivery. The editing efficacy and precision of the recombinant high-fidelity Cas9 (rCas9HF), characterized by the K526D substitution, was compared with the R691A mutant (HiFi Cas9), which is currently the only available high-fidelity Cas9 that can be used as an RNP. The comparative analysis was extended to gene substitution experiments where the two high fidelities were used in combination with a DNA donor template, generating different ratios of non-homologous end joining (NHEJ) versus homology-directed repair (HDR) for precise editing. The analyses revealed a heterogeneous efficacy and precision indicating different targeting capabilities between the two variants throughout the genome. The development of rCas9HF, characterized by an editing profile diverse from the currently used HiFi Cas9 in RNP electroporation, increases the genome editing solutions for the highest precision and efficient applications.
Collapse
Affiliation(s)
- Eleonora Pedrazzoli
- Department CIBIO, Laboratory of Molecular Virology, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| | - Andrea Bianchi
- Department CIBIO, Laboratory of Molecular Virology, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| | - Alessandro Umbach
- Department CIBIO, Laboratory of Molecular Virology, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| | - Simone Amistadi
- Department CIBIO, Laboratory of Molecular Virology, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| | - Mégane Brusson
- Imagine Institute, Laboratory of Chromatin and Gene Regulation During Development, Université de Paris, INSERM UMR 1163, Paris, France
| | - Giacomo Frati
- Imagine Institute, Laboratory of Chromatin and Gene Regulation During Development, Université de Paris, INSERM UMR 1163, Paris, France
| | - Matteo Ciciani
- Department CIBIO, Laboratory of Molecular Virology, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| | | | - Daniele Arosio
- Biophysics Institute, National Research Council of Italy, 38123 Trento, Italy
| | - Annarita Miccio
- Imagine Institute, Laboratory of Chromatin and Gene Regulation During Development, Université de Paris, INSERM UMR 1163, Paris, France
| | - Anna Cereseto
- Department CIBIO, Laboratory of Molecular Virology, University of Trento, Via Sommarive 9, 38123 Trento, Italy.
| | | |
Collapse
|
8
|
|