6
|
Madugula SS, Pujar P, Bharani N, Wang S, Jayasinghe-Arachchige VM, Pham T, Mashburn D, Artilis M, Liu J. Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576286. [PMID: 38328240 PMCID: PMC10849529 DOI: 10.1101/2024.01.22.576286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations like large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In the current study, we aim to elucidate the unique protein attributes associated with Cas9 and Cas12 families and identify the features that distinguish each family from the other. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,495 features) encoding various physiochemical, topological, constitutional, and coevolutionary information of Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and Non-Cas proteins. All the models were evaluated rigorously on the test and independent datasets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 95% and 97% on their respective independent datasets, while the multiclass classifier achieved a high F1 score of 0.97. We observed that Quasi-sequence-order descriptors like Schneider-lag descriptors and Composition descriptors like charge, volume, and polarizability are essential for the Cas12 family. More interestingly, we discovered that Amino Acid Composition descriptors, especially the Tripeptide Composition (TPC) descriptors, are important for the Cas9 family. Four of the identified important descriptors of Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all the Cas9 proteins and were located within different catalytically important domains of the Cas9 protein structure. Among these four tripeptides, tripeptides DHI and HHA are well-known to be involved in the DNA cleavage activity of the Cas9 protein. We therefore propose the the other two tripeptides, PWN and PYY, may also be essential for the Cas9 family. Our identified important descriptors enhanced the understanding of the catalytic mechanisms of Cas9 and Cas12 proteins and provide valuable insights into design of novel Cas systems to achieve enhanced gene-editing properties.
Collapse
Affiliation(s)
- Sita Sirisha Madugula
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Pranav Pujar
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Nammi Bharani
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Shouyi Wang
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Vindi M. Jayasinghe-Arachchige
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Tyler Pham
- Graduate School of Biomedical Sciences, University of North Texas Health Science Center, Fort Worth, Texas
| | - Dominic Mashburn
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Maria Artilis
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Jin Liu
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
- Graduate School of Biomedical Sciences, University of North Texas Health Science Center, Fort Worth, Texas
| |
Collapse
|
8
|
Adler BA, Trinidad MI, Bellieny-Rabelo D, Zhang E, Karp HM, Skopintsev P, Thornton BW, Weissman RF, Yoon P, Chen L, Hessler T, Eggers AR, Colognori D, Boger R, Doherty EE, Tsuchida CA, Tran RV, Hofman L, Shi H, Wasko KM, Zhou Z, Xia C, Al-Shimary MJ, Patel JR, Thomas VCJX, Pattali R, Kan MJ, Vardapetyan A, Yang A, Lahiri A, Maxwell MF, Murdock AG, Ramit GC, Henderson HR, Calvert RW, Bamert R, Knott GJ, Lapinaite A, Pausch P, Cofsky J, Sontheimer EJ, Wiedenheft B, Fineran PC, Brouns SJJ, Sashital DG, Thomas BC, Brown CT, Goltsman DSA, Barrangou R, Siksnys V, Banfield JF, Savage DF, Doudna JA. CasPEDIA Database: a functional classification system for class 2 CRISPR-Cas enzymes. Nucleic Acids Res 2024; 52:D590-D596. [PMID: 37889041 PMCID: PMC10767948 DOI: 10.1093/nar/gkad890] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/29/2023] [Accepted: 10/04/2023] [Indexed: 10/28/2023] Open
Abstract
CRISPR-Cas enzymes enable RNA-guided bacterial immunity and are widely used for biotechnological applications including genome editing. In particular, the Class 2 CRISPR-associated enzymes (Cas9, Cas12 and Cas13 families), have been deployed for numerous research, clinical and agricultural applications. However, the immense genetic and biochemical diversity of these proteins in the public domain poses a barrier for researchers seeking to leverage their activities. We present CasPEDIA (http://caspedia.org), the Cas Protein Effector Database of Information and Assessment, a curated encyclopedia that integrates enzymatic classification for hundreds of different Cas enzymes across 27 phylogenetic groups spanning the Cas9, Cas12 and Cas13 families, as well as evolutionarily related IscB and TnpB proteins. All enzymes in CasPEDIA were annotated with a standard workflow based on their primary nuclease activity, target requirements and guide-RNA design constraints. Our functional classification scheme, CasID, is described alongside current phylogenetic classification, allowing users to search related orthologs by enzymatic function and sequence similarity. CasPEDIA is a comprehensive data portal that summarizes and contextualizes enzymatic properties of widely used Cas enzymes, equipping users with valuable resources to foster biotechnological development. CasPEDIA complements phylogenetic Cas nomenclature and enables researchers to leverage the multi-faceted nucleic-acid targeting rules of diverse Class 2 Cas enzymes.
Collapse
Affiliation(s)
- Benjamin A Adler
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
| | - Marena I Trinidad
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
| | - Daniel Bellieny-Rabelo
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
| | - Elaine Zhang
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
| | - Hannah M Karp
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Chemistry, University of California, Berkeley, CA 94720, USA
| | - Petr Skopintsev
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
| | - Brittney W Thornton
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Rachel F Weissman
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Peter H Yoon
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - LinXing Chen
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Earth and Planetary Sciences, University of California, Berkeley, CA 94720, USA
| | - Tomas Hessler
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Earth and Planetary Sciences, University of California, Berkeley, CA 94720, USA
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, USA
- EGSB Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Amy R Eggers
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - David Colognori
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Ron Boger
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
| | - Erin E Doherty
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
| | - Connor A Tsuchida
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Ryan V Tran
- Department of Chemistry, University of California, Berkeley, CA 94720, USA
| | - Laura Hofman
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
- Graduate School of Life Sciences, Utrecht University, 3584 CS Utrecht, UT, The Netherlands
| | - Honglue Shi
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
| | - Kevin M Wasko
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Zehan Zhou
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Chenglong Xia
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
| | - Muntathar J Al-Shimary
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Jaymin R Patel
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Vienna C J X Thomas
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Chemistry, University of California, Berkeley, CA 94720, USA
| | - Rithu Pattali
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Matthew J Kan
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Pediatrics, Division of Allergy, Immunology, and Bone Marrow Transplantation, University of California, San Francisco, CA 94158, USA
| | - Anna Vardapetyan
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Alana Yang
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Arushi Lahiri
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Micaela F Maxwell
- Department of Chemistry and Biochemistry, Hampton University, Hampton, VA 23668, USA
| | - Andrew G Murdock
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Glenn C Ramit
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Hope R Henderson
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
| | - Roland W Calvert
- Monash Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC 3168, Australia
| | - Rebecca S Bamert
- Monash Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC 3168, Australia
| | - Gavin J Knott
- Monash Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC 3168, Australia
| | - Audrone Lapinaite
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85281, USA
- Arizona State University-Banner Neurodegenerative Disease Research Center at the Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
- Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Patrick Pausch
- LSC-EMBL Partnership Institute for Genome Editing Technologies, Life Sciences Center, Vilnius University, Vilnius 10257, Lithuania
| | - Joshua C Cofsky
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Erik J Sontheimer
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
- Program in Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
- Li Weibo Institute for Rare Diseases Research, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Blake Wiedenheft
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, MT 59717, USA
| | - Peter C Fineran
- Department of Microbiology and Immunology, University of Otago, Dunedin 9016, New Zealand
- Genetics Otago, University of Otago, Dunedin 9016, New Zealand
- Bioprotection Aotearoa, University of Otago, Dunedin 9016, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Otago, Dunedin 9016, New Zealand
| | - Stan J J Brouns
- Department of Bionanoscience, Delft University of Technology, 2629 HZ Delft, Netherlands
- Kavli Institute of Nanoscience, 2629 HZ Delft, The Netherlands
| | - Dipali G Sashital
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | | | | | | | - Rodolphe Barrangou
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Food, Bioprocessing and Nutrition Sciences, North Carolina State University, Raleigh, NC 27606, USA
| | - Virginius Siksnys
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius 10257, Lithuania
| | - Jillian F Banfield
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Department of Earth and Planetary Sciences, University of California, Berkeley, CA 94720, USA
- Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, USA
- EGSB Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- The University of Melbourne, Parkville, VIC 3052, Australia
| | - David F Savage
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Jennifer A Doudna
- Innovative Genomics Institute, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
- Department of Chemistry, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- MBIB Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Gladstone Institutes, University of California, San Francisco, CA 94158, USA
| |
Collapse
|