1
|
Aiman S, Ahmad A, Malik A, Chen R, Hanif MF, Khan AA, Ansari MA, Farrukh S, Xu G, Shahab M, Huang K. Whole proteome-integrated and vaccinomics-based next generation mRNA vaccine design against Pseudomonas aeruginosa-A hierarchical subtractive proteomics approach. Int J Biol Macromol 2025; 309:142627. [PMID: 40174835 DOI: 10.1016/j.ijbiomac.2025.142627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 03/09/2025] [Accepted: 03/27/2025] [Indexed: 04/04/2025]
Abstract
Pseudomonas aeruginosa (P. aeruginosa) is a multidrug-resistant opportunistic pathogen responsible for chronic obstructive pulmonary disease (COPD), cystic fibrosis, and ventilator-associated pneumonia (VAP), leading to cancer. Developing an efficacious vaccine remains the most promising strategy for combating P. aeruginosa infections. In this study, we employed an advanced in silico strategy to design a highly efficient and stable mRNA vaccine using immunoinformatics tools. Whole proteome data were utilized to identify highly immunogenic vaccine candidates using subtractive proteomics. Three extracellular proteins were prioritized for T- and linear B-cell epitope prediction. Beta-definsin protein sequence was incorporated as an adjuvant at the N-terminus of the construct. A total of 3 CTL, 3 HTL, and 3 linear B cell highly immunogenic epitopes were combined using specific linkers to design this multi-peptide construct. The 5' and 3' UTR sequences, Kozak sequence with a stop codon, and signal peptides followed by a poly-A tail were incorporated into the above vaccine construct to create our final mRNA vaccine. The vaccines exhibited antigenicity scores >0.88, ensuring high antigenicity with no allergenic or toxic. Physiochemical properties analysis revealed high solubility and thermostability. Three-dimensional structural analysis determined high-quality structures. Vaccine-receptor docking and molecular dynamic simulations demonstrated strong molecular interactions, stable binding affinities, dynamic nature, and structural stability of this vaccine, with significant immunogenic responses of the immune system against the vaccine. The immunological simulation indicates successful cellular and humoral immune responses to defend against P. aeruginosa infection. Validation of the study outcomes necessitates both experimental and clinical testing.
Collapse
Affiliation(s)
- Sara Aiman
- Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, China; Liaobu Hospital of Dongguan City, Dongguan, China
| | - Abbas Ahmad
- Department of Biotechnology, Abdul Wali Khan University, Mardan, Pakistan
| | - Abdul Malik
- Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia.
| | - Rui Chen
- Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, China
| | - Muhammad Farhan Hanif
- Department of Energy and Resource Engineering, College of Engineering, Peking University, Beijing 100871, China.
| | - Azmat Ali Khan
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia.
| | - Mushtaq Ahmed Ansari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia.
| | | | - Guangxian Xu
- Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, China.
| | - Muhammad Shahab
- State key laboratories of chemical Resources Engineering Beijing University of Chemical Technology, Beijing 100029, China.
| | - Kaisong Huang
- Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, China; Liaobu Hospital of Dongguan City, Dongguan, China.
| |
Collapse
|
2
|
da Rocha W, Liberti L, Mucherino A, Malliavin TE. Influence of Stereochemistry in a Local Approach for Calculating Protein Conformations. J Chem Inf Model 2024; 64:8999-9008. [PMID: 39560315 DOI: 10.1021/acs.jcim.4c01232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2024]
Abstract
Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.
Collapse
Affiliation(s)
- Wagner da Rocha
- LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
| | - Leo Liberti
- LIX CNRS, École Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France
| | | | - Thérèse E Malliavin
- LPCT, UMR 7019 Université de Lorraine CNRS, Vandoeuvre-lès-Nancy 54500, France
| |
Collapse
|
3
|
Kumar A, Tushir S, Devasurmutt Y, Nath SS, Tatu U. Identification of clade-defining single nucleotide polymorphisms for improved rabies virus surveillance. New Microbes New Infect 2024; 62:101511. [PMID: 39512853 PMCID: PMC11542045 DOI: 10.1016/j.nmni.2024.101511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 10/14/2024] [Accepted: 10/15/2024] [Indexed: 11/15/2024] Open
Abstract
Background Rabies is an ancient disease that remains endemic in many countries. It causes many human deaths annually, predominantly in resource-poor countries. Over evolutionary timelines, several rabies virus (RABV) genotypes have stabilised, forming distinct clades. Extensive studies have been conducted on the origin, occurrence and spread of RABV clades. Single nucleotide polymorphisms (SNPs) distribution across the RABV genome and its clades remains largely unknown, highlighting the need for comprehensive whole-genome analyses. Methods We accessed whole genome sequences for RABV from public databases and identified SNPs across the whole genome sequences. Then, we annotated these SNPs using an R script, and these SNPs were categorised into different categories; universal, clade-specific, and clade-defining, based on the frequency of occurrence. Results In this study, we present the SNPs occurring in the RABV based on whole genome sequences belonging to 8 clades isolated from 7 different host species likely to harbour dog-related rabies. We classified mutations into several classes based on their location within the genome and assessed the effect of SNP mutations on the viral glycoprotein. Conclusions The clade-defining mutations have implications for targeted surveillance and classification of clades. Additionally, we investigated the effects of these mutations on the Glycoprotein of the virus. Our findings contribute to expanding knowledge about RABV clade diversity and evolution, which has significant implications for effectively tracking and combatting RABV transmission.
Collapse
Affiliation(s)
- Ankeet Kumar
- Department of Biochemistry, Division of Biological Sciences, Indian Institute of Science, Bangalore, India
| | - Sheetal Tushir
- Department of Biochemistry, Division of Biological Sciences, Indian Institute of Science, Bangalore, India
| | - Yashas Devasurmutt
- Department of Biochemistry, Division of Biological Sciences, Indian Institute of Science, Bangalore, India
| | - Sujith S. Nath
- Department of Biochemistry, Division of Biological Sciences, Indian Institute of Science, Bangalore, India
| | - Utpal Tatu
- Department of Biochemistry, Division of Biological Sciences, Indian Institute of Science, Bangalore, India
| |
Collapse
|
4
|
Rosenberg AA, Marx A, Bronstein AM. A dataset of alternately located segments in protein crystal structures. Sci Data 2024; 11:783. [PMID: 39019896 PMCID: PMC11255211 DOI: 10.1038/s41597-024-03595-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 07/01/2024] [Indexed: 07/19/2024] Open
Abstract
Protein Data Bank (PDB) files list the relative spatial location of atoms in a protein structure as the final output of the process of fitting and refining to experimentally determined electron density measurements. Where experimental evidence exists for multiple conformations, atoms are modelled in alternate locations. Programs reading PDB files commonly ignore these alternate conformations by default leaving users oblivious to the presence of alternate conformations in the structures they analyze. This has led to underappreciation of their prevalence, under characterisation of their features and limited the accessibility to this high-resolution data representing structural ensembles. We have trawled PDB files to extract structural features of residues with alternately located atoms. The output includes the distance between alternate conformations and identifies the location of these segments within the protein chain and in proximity of all other atoms within a defined radius. This dataset should be of use in efforts to predict multiple structures from a single sequence and support studies investigating protein flexibility and the association with protein function.
Collapse
Affiliation(s)
- Aviv A Rosenberg
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel
| | - Ailie Marx
- Department of Molecular and Computational Biosciences and Biotechnology, Migal - Galilee Research Institute, Qiryat, Israel.
| | - Alexander M Bronstein
- Department of Computer Science, Technion - Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
5
|
Komar AA, Samatova E, Rodnina MV. Translation Rates and Protein Folding. J Mol Biol 2024; 436:168384. [PMID: 38065274 DOI: 10.1016/j.jmb.2023.168384] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/01/2023] [Accepted: 12/02/2023] [Indexed: 12/19/2023]
Abstract
The mRNA coding sequence defines not only the amino acid sequence of the protein, but also the speed at which the ribosomes move along the mRNA while making the protein. The non-uniform local kinetics - denoted as translational rhythm - is similar among mRNAs coding for related protein folds. Deviations from this conserved rhythm can result in protein misfolding. In this review we summarize the experimental evidence demonstrating how local translation rates affect cotranslational protein folding, with the focus on the synonymous codons and patches of charged residues in the nascent peptide as best-studied examples. Alterations in nascent protein conformations due to disturbed translational rhythm can persist off the ribosome, as demonstrated by the effects of synonymous codon variants of several disease-related proteins. Charged amino acid patches in nascent chains also modulate translation and cotranslational protein folding, and can abrogate translation when placed at the N-terminus of the nascent peptide. During cotranslational folding, incomplete nascent chains navigate through a unique conformational landscape in which earlier intermediate states become inaccessible as the nascent peptide grows. Precisely tuned local translation rates, as well as interactions with the ribosome, guide the folding pathway towards the native structure, whereas deviations from the natural translation rhythm may favor pathways leading to trapped misfolded states. Deciphering the 'folding code' of the mRNA will contribute to understanding the diseases caused by protein misfolding and to rational protein design.
Collapse
Affiliation(s)
- Anton A Komar
- Center for Gene Regulation in Health and Disease, Department of Biological, Geological and Environmental Sciences, Cleveland State University, 2121 Euclid Avenue, Cleveland, OH 44115, USA; Department of Biochemistry and Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA.
| | - Ekaterina Samatova
- Max Planck Department of Physical Biochemistry, Max Planck Institute for Multidisciplinary Sciences, 37077 Goettingen, Germany
| | - Marina V Rodnina
- Max Planck Department of Physical Biochemistry, Max Planck Institute for Multidisciplinary Sciences, 37077 Goettingen, Germany.
| |
Collapse
|
6
|
Huang M, Liu YU, Yao X, Qin D, Su H. Variability in SOD1-associated amyotrophic lateral sclerosis: geographic patterns, clinical heterogeneity, molecular alterations, and therapeutic implications. Transl Neurodegener 2024; 13:28. [PMID: 38811997 PMCID: PMC11138100 DOI: 10.1186/s40035-024-00416-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 04/17/2024] [Indexed: 05/31/2024] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease characterized by progressive loss of motor neurons, resulting in global health burden and limited post-diagnosis life expectancy. Although primarily sporadic, familial ALS (fALS) cases suggest a genetic basis. This review focuses on SOD1, the first gene found to be associated with fALS, which has been more recently confirmed by genome sequencing. While informative, databases such as ALSoD and STRENGTH exhibit regional biases. Through a systematic global examination of SOD1 mutations from 1993 to 2023, we found different geographic distributions and clinical presentations. Even though different SOD1 variants are expressed at different protein levels and have different half-lives and dismutase activities, these alterations lead to loss of function that is not consistently correlated with disease severity. Gain of function of toxic aggregates of SOD1 resulting from mutated SOD1 has emerged as one of the key contributors to ALS. Therapeutic interventions specifically targeting toxic gain of function of mutant SOD1, including RNA interference and antibodies, show promise, but a cure remains elusive. This review provides a comprehensive perspective on SOD1-associated ALS and describes molecular features and the complex genetic landscape of SOD1, highlighting its importance in determining diverse clinical manifestations observed in ALS patients and emphasizing the need for personalized therapeutic strategies.
Collapse
Affiliation(s)
- Miaodan Huang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, Department of Pharmaceutical Sciences, Faculty of Health Sciences, University of Macau, Macao, China
| | - Yong U Liu
- Laboratory for Neuroimmunology in Health and Diseases, Guangzhou First People's Hospital School of Medicine, South China University of Technology, Guangzhou, China
| | - Xiaoli Yao
- Department of Neurology, The First Affiliated Hospital, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Diagnosis and Treatment of Major Neurological Diseases, National Key Clinical Department and Key Discipline of Neurology, Guangzhou, China.
| | - Dajiang Qin
- Key Laboratory of Biological Targeting Diagnosis, Therapy and Rehabilitation of Guangdong Higher Education Institutes, The Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510799, China.
| | - Huanxing Su
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, Department of Pharmaceutical Sciences, Faculty of Health Sciences, University of Macau, Macao, China.
| |
Collapse
|
7
|
Akeju OJ, Cope AL. Re-examining Correlations Between Synonymous Codon Usage and Protein Bond Angles in Escherichia coli. Genome Biol Evol 2024; 16:evae080. [PMID: 38619010 PMCID: PMC11077309 DOI: 10.1093/gbe/evae080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 04/05/2024] [Accepted: 04/10/2024] [Indexed: 04/16/2024] Open
Abstract
Rosenberg AA, Marx A, Bronstein AM (Codon-specific Ramachandran plots show amino acid backbone conformation depends on identity of the translated codon. Nat Commun. 2022:13:2815) recently found a surprising correlation between synonymous codon usage and the dihedral bond angles of the resulting amino acid. However, their analysis did not account for the strongest known correlate of codon usage: gene expression. We re-examined the relationship between bond angles and codon usage by applying the approach of Rosenberg et al. to simulated protein-coding sequences that (i) have random codon usage, (ii) codon usage determined by mutation biases, and (iii) maintain the general relationship between codon usage and gene expression via the assumption of selection-mutation-drift equilibrium. We observed correlations between dihedral bond angle and codon usage when codon usage is entirely random, indicating possible conflation of noise with differences in bond angle distributions between synonymous codons. More relevant to the general analysis of codon usage patterns, we found surprisingly good agreement between the analysis of the real sequences and the analysis of sequences simulated assuming selection-mutation-drift equilibrium, with 91% of significant synonymous codon pairs detected in the former were also detected in the latter. We believe the correlation between codon usage and dihedral bond angles resulted from the variation in codon usage across genes due to the interplay between mutation bias, natural selection for translation efficiency, and gene expression, further underscoring these factors must be controlled for when looking for novel patterns related to codon usage.
Collapse
Affiliation(s)
| | - Alexander L Cope
- Department of Genetics, Rutgers University, Piscataway, New Jersey, USA
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, USA
- Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey, USA
| |
Collapse
|
8
|
Gudkov M, Thibaut L, Giannoulatou E. Quantifying negative selection on synonymous variants. HGG ADVANCES 2024; 5:100262. [PMID: 38192100 PMCID: PMC10835449 DOI: 10.1016/j.xhgg.2024.100262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 01/10/2024] Open
Abstract
Widespread adoption of DNA sequencing has resulted in large numbers of genetic variants, whose contribution to disease is not easily determined. Although many types of variation are known to disrupt cellular processes in predictable ways, for some categories of variants, the effects may not be directly detectable. A particular example is synonymous variants, that is, those single-nucleotide variants that create a codon substitution, such that the produced amino acid sequence is unaffected. Contrary to the original theory suggesting that synonymous variants are benign, there is a growing volume of research showing that, despite their "silent" mechanism of action, some synonymous variation may be deleterious. Here, we studied the extent of the negative selective pressure acting on different classes of synonymous variants by analyzing the relative enrichment of synonymous singleton variants in the human exomes provided by gnomAD. Using a modification of the mutability-adjusted proportion of singletons (MAPS) metric as a measure of purifying selection, we found that some classes of synonymous variants are subject to stronger negative selection than others. For instance, variants that reduce codon optimality undergo stronger selection than optimality-increasing variants. Besides, selection affects synonymous variants implicated in splice-site-loss or splice-site-gain events. To understand what drives this negative selection, we tested a number of predictors in the aim to explain the variability in the selection scores. Our findings provide insights into the effects of synonymous variants at the population level, highlighting the specifics of the role that these variants play in health and disease.
Collapse
Affiliation(s)
- Mikhail Gudkov
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St Vincent's Clinical School, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Loïc Thibaut
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; School of Mathematics and Statistics, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Eleni Giannoulatou
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia; St Vincent's Clinical School, UNSW Sydney, Sydney, NSW 2052, Australia.
| |
Collapse
|
9
|
Gupta P, Dholaniya PS, Princy K, Madhavan AS, Sreelakshmi Y, Sharma R. Augmenting tomato functional genomics with a genome-wide induced genetic variation resource. FRONTIERS IN PLANT SCIENCE 2024; 14:1290937. [PMID: 38328621 PMCID: PMC10848261 DOI: 10.3389/fpls.2023.1290937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 12/22/2023] [Indexed: 02/09/2024]
Abstract
Induced mutations accelerate crop improvement by providing novel disease resistance and yield alleles. However, the alleles with no perceptible phenotype but have an altered function remain hidden in mutagenized plants. The whole-genome sequencing (WGS) of mutagenized individuals uncovers the complete spectrum of mutations in the genome. Genome-wide induced mutation resources can improve the targeted breeding of tomatoes and facilitate functional genomics. In this study, we sequenced 132 doubly ethyl methanesulfonate (EMS)-mutagenized lines of tomato and detected approximately 41 million novel mutations and 5.5 million short InDels not present in the parental cultivar. Approximately 97% of the genome had mutations, including the genes, promoters, UTRs, and introns. More than one-third of genes in the mutagenized population had one or more deleterious mutations predicted by Sorting Intolerant From Tolerant (SIFT). Nearly one-fourth of deleterious genes mapped on tomato metabolic pathways modulate multiple pathway steps. In addition to the reported GC>AT transition bias for EMS, our population also had a substantial number of AT>GC transitions. Comparing mutation frequency among synonymous codons revealed that the most preferred codon is the least mutagenic toward EMS. The validation of a potato leaf-like mutation, reduction in carotenoids in ζ-carotene isomerase mutant fruits, and chloroplast relocation loss in phototropin1 mutant validated the mutation discovery pipeline. Our database makes a large repertoire of mutations accessible to functional genomics studies and breeding of tomatoes.
Collapse
Affiliation(s)
- Prateek Gupta
- Repository of Tomato Genomics Resources, Department of Plant Sciences, University of Hyderabad, Hyderabad, India
- Department of Biological Sciences, SRM University-AP, Amaravati, Andhra Pradesh, India
| | - Pankaj Singh Dholaniya
- Department of Biotechnology and Bioinformatics, University of Hyderabad, Hyderabad, India
| | - Kunnappady Princy
- Repository of Tomato Genomics Resources, Department of Plant Sciences, University of Hyderabad, Hyderabad, India
| | - Athira Sethu Madhavan
- Repository of Tomato Genomics Resources, Department of Plant Sciences, University of Hyderabad, Hyderabad, India
| | - Yellamaraju Sreelakshmi
- Repository of Tomato Genomics Resources, Department of Plant Sciences, University of Hyderabad, Hyderabad, India
| | - Rameshwar Sharma
- Repository of Tomato Genomics Resources, Department of Plant Sciences, University of Hyderabad, Hyderabad, India
| |
Collapse
|
10
|
Louros N, Schymkowitz J, Rousseau F. Mechanisms and pathology of protein misfolding and aggregation. Nat Rev Mol Cell Biol 2023; 24:912-933. [PMID: 37684425 DOI: 10.1038/s41580-023-00647-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2023] [Indexed: 09/10/2023]
Abstract
Despite advances in machine learning-based protein structure prediction, we are still far from fully understanding how proteins fold into their native conformation. The conventional notion that polypeptides fold spontaneously to their biologically active states has gradually been replaced by our understanding that cellular protein folding often requires context-dependent guidance from molecular chaperones in order to avoid misfolding. Misfolded proteins can aggregate into larger structures, such as amyloid fibrils, which perpetuate the misfolding process, creating a self-reinforcing cascade. A surge in amyloid fibril structures has deepened our comprehension of how a single polypeptide sequence can exhibit multiple amyloid conformations, known as polymorphism. The assembly of these polymorphs is not a random process but is influenced by the specific conditions and tissues in which they originate. This observation suggests that, similar to the folding of native proteins, the kinetics of pathological amyloid assembly are modulated by interactions specific to cells and tissues. Here, we review the current understanding of how intrinsic protein conformational propensities are modulated by physiological and pathological interactions in the cell to shape protein misfolding and aggregation pathology.
Collapse
Affiliation(s)
- Nikolaos Louros
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Joost Schymkowitz
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium.
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| | - Frederic Rousseau
- Switch Laboratory, VIB-KU Leuven Center for Brain & Disease Research, Leuven, Belgium.
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
11
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
12
|
Rosenberg AA, Yehishalom N, Marx A, Bronstein AM. An amino-domino model described by a cross-peptide-bond Ramachandran plot defines amino acid pairs as local structural units. Proc Natl Acad Sci U S A 2023; 120:e2301064120. [PMID: 37878722 PMCID: PMC10623034 DOI: 10.1073/pnas.2301064120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 08/24/2023] [Indexed: 10/27/2023] Open
Abstract
Protein structure, both at the global and local level, dictates function. Proteins fold from chains of amino acids, forming secondary structures, α-helices and β-strands, that, at least for globular proteins, subsequently fold into a three-dimensional structure. Here, we show that a Ramachandran-type plot focusing on the two dihedral angles separated by the peptide bond, and entirely contained within an amino acid pair, defines a local structural unit. We further demonstrate the usefulness of this cross-peptide-bond Ramachandran plot by showing that it captures β-turn conformations in coil regions, that traditional Ramachandran plot outliers fall into occupied regions of our plot, and that thermophilic proteins prefer specific amino acid pair conformations. Further, we demonstrate experimentally that the effect of a point mutation on backbone conformation and protein stability depends on the amino acid pair context, i.e., the identity of the adjacent amino acid, in a manner predictable by our method.
Collapse
Affiliation(s)
- Aviv A. Rosenberg
- Department of Computer Science, Technion–Israel Institute of Technology, Haifa32000, Israel
| | - Nitsan Yehishalom
- Faculty of Biology, Technion–Israel Institute of Technology, Haifa32000, Israel
| | - Ailie Marx
- Department of Computer Science, Technion–Israel Institute of Technology, Haifa32000, Israel
| | - Alex M. Bronstein
- Department of Computer Science, Technion–Israel Institute of Technology, Haifa32000, Israel
| |
Collapse
|
13
|
Hallee L, Rafailidis N, Gleghorn JP. cdsBERT - Extending Protein Language Models with Codon Awareness. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.15.558027. [PMID: 37745387 PMCID: PMC10516008 DOI: 10.1101/2023.09.15.558027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Recent advancements in Protein Language Models (pLMs) have enabled high-throughput analysis of proteins through primary sequence alone. At the same time, newfound evidence illustrates that codon usage bias is remarkably predictive and can even change the final structure of a protein. Here, we explore these findings by extending the traditional vocabulary of pLMs from amino acids to codons to encapsulate more information inside CoDing Sequences (CDS). We build upon traditional transfer learning techniques with a novel pipeline of token embedding matrix seeding, masked language modeling, and student-teacher knowledge distillation, called MELD. This transformed the pretrained ProtBERT into cdsBERT; a pLM with a codon vocabulary trained on a massive corpus of CDS. Interestingly, cdsBERT variants produced a highly biochemically relevant latent space, outperforming their amino acid-based counterparts on enzyme commission number prediction. Further analysis revealed that synonymous codon token embeddings moved distinctly in the embedding space, showcasing unique additions of information across broad phylogeny inside these traditionally "silent" mutations. This embedding movement correlated significantly with average usage bias across phylogeny. Future fine-tuned organism-specific codon pLMs may potentially have a more significant increase in codon usage fidelity. This work enables an exciting potential in using the codon vocabulary to improve current state-of-the-art structure and function prediction that necessitates the creation of a codon pLM foundation model alongside the addition of high-quality CDS to large-scale protein databases.
Collapse
Affiliation(s)
- Logan Hallee
- Center for Bioinformatics and Computational Biology, University of Delaware
| | | | | |
Collapse
|
14
|
Wang Y, Selvaraj MS, Li X, Li Z, Holdcraft JA, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Cade BE, Carlson JC, Carson AP, Chen YDI, Curran JE, de Vries PS, Dutcher SK, Ellinor PT, Floyd JS, Fornage M, Freedman BI, Gabriel S, Germer S, Gibbs RA, Guo X, He J, Heard-Costa N, Hildalgo B, Hou L, Irvin MR, Joehanes R, Kaplan RC, Kardia SLR, Kelly TN, Kim R, Kooperberg C, Kral BG, Levy D, Li C, Liu C, Lloyd-Jone D, Loos RJF, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Murabito JM, Naseri T, O’Connell JR, Palmer ND, Preuss MH, Psaty BM, Raffield LM, Rao DC, Redline S, Reiner AP, Rich SS, Ruepena MS, Sheu WHH, Smith JA, Smith A, Tiwari HK, Tsai MY, Viaud-Martinez KA, Wang Z, Yanek LR, Zhao W, Rotter JI, Lin X, Natarajan P, Peloso GM. Rare variants in long non-coding RNAs are associated with blood lipid levels in the TOPMed Whole Genome Sequencing Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.28.23291966. [PMID: 37425772 PMCID: PMC10327287 DOI: 10.1101/2023.06.28.23291966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Long non-coding RNAs (lncRNAs) are known to perform important regulatory functions. Large-scale whole genome sequencing (WGS) studies and new statistical methods for variant set tests now provide an opportunity to assess the associations between rare variants in lncRNA genes and complex traits across the genome. In this study, we used high-coverage WGS from 66,329 participants of diverse ancestries with blood lipid levels (LDL-C, HDL-C, TC, and TG) in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program to investigate the role of lncRNAs in lipid variability. We aggregated rare variants for 165,375 lncRNA genes based on their genomic locations and conducted rare variant aggregate association tests using the STAAR (variant-Set Test for Association using Annotation infoRmation) framework. We performed STAAR conditional analysis adjusting for common variants in known lipid GWAS loci and rare coding variants in nearby protein coding genes. Our analyses revealed 83 rare lncRNA variant sets significantly associated with blood lipid levels, all of which were located in known lipid GWAS loci (in a ±500 kb window of a Global Lipids Genetics Consortium index variant). Notably, 61 out of 83 signals (73%) were conditionally independent of common regulatory variations and rare protein coding variations at the same loci. We replicated 34 out of 61 (56%) conditionally independent associations using the independent UK Biobank WGS data. Our results expand the genetic architecture of blood lipids to rare variants in lncRNA, implicating new therapeutic opportunities.
Collapse
Affiliation(s)
- Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Margaret Sunitha Selvaraj
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
- Center for Computational Biology & Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Jacob A. Holdcraft
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Donna K. Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA
- Department of Epidemiology and Biostatistics, University of South Carolina Arnold School of Public Health, Columbia, SC, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Donald W. Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Brian E. Cade
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Jenna C. Carlson
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Susan K. Dutcher
- The McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Patrick T. Ellinor
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - James S. Floyd
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Myriam Fornage
- Center for Human Genetics, University of Texas Health at Houston, Houston, TX, USA
| | - Barry I. Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | | | | | - Richard A. Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Nancy Heard-Costa
- Framingham Heart Study, Framingham, MA, USA
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Bertha Hildalgo
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Marguerite R. Irvin
- Department of Epidemiology, University of Alabama at Birmingham School of Public Health, Birmingham, AL, USA
| | - Roby Joehanes
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert C. Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon LR. Kardia
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Tanika N. Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Ryan Kim
- Psomagen, Inc. (formerly Macrogen USA), Rockville, MD, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G. Kral
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Daniel Levy
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Don Lloyd-Jone
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Ruth JF. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- NNF Center for Basic Metabolic Research, University of Copenhagen, Cophenhagen, Denmark
| | - Michael C. Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Lisa W. Martin
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A. Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ryan L. Minster
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D. Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E. Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne M. Murabito
- Framingham Heart Study, Framingham, MA, USA
- Department of Medicine, Boston Medical Center, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, USA
| | | | - Jeffrey R. O’Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dabeeru C. Rao
- Division of Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | | | | | - Jennifer A. Smith
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Albert Smith
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K. Tiwari
- Department of Biostatistics, University of Alabama, Birmingham, AL, USA
| | - Michael Y. Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | | | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa R. Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Pradeep Natarajan
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| |
Collapse
|
15
|
Barbosa Pereira PJ, Manso JA, Macedo-Ribeiro S. The structural plasticity of polyglutamine repeats. Curr Opin Struct Biol 2023; 80:102607. [PMID: 37178477 DOI: 10.1016/j.sbi.2023.102607] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 04/11/2023] [Accepted: 04/12/2023] [Indexed: 05/15/2023]
Abstract
From yeast to humans, polyglutamine (polyQ) repeat tracts are found frequently in the proteome and are particularly prominent in the activation domains of transcription factors. PolyQ is a polymorphic motif that modulates functional protein-protein interactions and aberrant self-assembly. Expansion of the polyQ repeated sequences beyond critical physiological repeat length thresholds triggers self-assembly and is linked to severe pathological implications. This review provides an overview of the current knowledge on the structures of polyQ tracts in the soluble and aggregated states and discusses the influence of neighboring regions on polyQ secondary structure, aggregation, and fibril morphologies. The influence of the genetic context of the polyQ-encoding trinucleotides is briefly discussed as a challenge for future endeavors in this field.
Collapse
Affiliation(s)
- Pedro José Barbosa Pereira
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal.
| | - José A Manso
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| | - Sandra Macedo-Ribeiro
- IBMC - Instituto de Biologia Molecular e Celular, Universidade do Porto, 4200-135, Porto, Portugal; Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135, Porto, Portugal
| |
Collapse
|
16
|
Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol Sci 2023; 44:175-189. [PMID: 36669976 DOI: 10.1016/j.tips.2022.12.005] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/21/2022] [Accepted: 12/22/2022] [Indexed: 01/19/2023]
Abstract
Due to their high target specificity and binding affinity, therapeutic antibodies are currently the largest class of biotherapeutics. The traditional largely empirical antibody development process is, while mature and robust, cumbersome and has significant limitations. Substantial recent advances in computational and artificial intelligence (AI) technologies are now starting to overcome many of these limitations and are increasingly integrated into development pipelines. Here, we provide an overview of AI methods relevant for antibody development, including databases, computational predictors of antibody properties and structure, and computational antibody design methods with an emphasis on machine learning (ML) models, and the design of complementarity-determining region (CDR) loops, antibody structural components critical for binding.
Collapse
|
17
|
Hallee L, Khomtchouk BB. Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life. Sci Rep 2023; 13:2088. [PMID: 36747072 PMCID: PMC9902438 DOI: 10.1038/s41598-023-28965-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/27/2023] [Indexed: 02/08/2023] Open
Abstract
In this study, we investigate how an organism's codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.
Collapse
Affiliation(s)
- Logan Hallee
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19713, USA
| | - Bohdan B Khomtchouk
- Department of BioHealth Informatics, Center for Computational Biology and Bioinformatics, Indiana University, Indianapolis, IN, 46202, USA.
| |
Collapse
|
18
|
Implementing computational methods in tandem with synonymous gene recoding for therapeutic development. Trends Pharmacol Sci 2023; 44:73-84. [PMID: 36307252 DOI: 10.1016/j.tips.2022.09.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/26/2022] [Accepted: 09/27/2022] [Indexed: 12/24/2022]
Abstract
Synonymous gene recoding, the substitution of synonymous variants into the genetic sequence, has been used to overcome many production limitations in therapeutic development. However, the safety and efficacy of recoded therapeutics can be difficult to evaluate because synonymous codon substitutions can result in subtle, yet impactful changes in protein features and require sensitive methods for detection. Given that computational approaches have made significant leaps in recent years, we propose that machine-learning (ML) tools may be leveraged to assess gene-recoded therapeutics and foresee an opportunity to adapt codon contexts to enhance some powerful existing tools. Here, we examine how synonymous gene recoding has been used to address challenges in therapeutic development, explain the biological mechanisms underlying its effects, and explore the application of computational platforms to improve the surveillance of functional variants in therapeutic design.
Collapse
|
19
|
Lopes da Costa B, Kolesnikova M, Levi SR, Cabral T, Tsang SH, Maumenee IH, Quinn PMJ. Clinical and Therapeutic Evaluation of the Ten Most Prevalent CRB1 Mutations. Biomedicines 2023; 11:385. [PMID: 36830922 PMCID: PMC9953187 DOI: 10.3390/biomedicines11020385] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/20/2023] [Accepted: 01/23/2023] [Indexed: 01/31/2023] Open
Abstract
Mutations in the Crumbs homolog 1 (CRB1) gene lead to severe inherited retinal dystrophies (IRDs), accounting for nearly 80,000 cases worldwide. To date, there is no therapeutic option for patients suffering from CRB1-IRDs. Therefore, it is of great interest to evaluate gene editing strategies capable of correcting CRB1 mutations. A retrospective chart review was conducted on ten patients demonstrating one or two of the top ten most prevalent CRB1 mutations and receiving care at Columbia University Irving Medical Center, New York, NY, USA. Patient phenotypes were consistent with previously published data for individual CRB1 mutations. To identify the optimal gene editing strategy for these ten mutations, base and prime editing designs were evaluated. For base editing, we adopted the use of a near-PAMless Cas9 (SpRY Cas9), whereas for prime editing, we evaluated the canonical NGG and NGA prime editors. We demonstrate that for the correction of c.2843G>A, p.(Cys948Tyr), the most prevalent CRB1 mutation, base editing has the potential to generate harmful bystanders. Prime editing, however, avoids these bystanders, highlighting its future potential to halt CRB1-mediated disease progression. Additional studies investigating prime editing for CRB1-IRDs are needed, as well as a thorough analysis of prime editing's application, efficiency, and safety in the retina.
Collapse
Affiliation(s)
- Bruna Lopes da Costa
- Department of Biomedical Engineering, Columbia University, New York, NY 10027, USA
- Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center/New York-Presbyterian Hospital, New York, NY 10032, USA
- Jonas Children′s Vision Care, and Bernard & Shirlee Brown Glaucoma Laboratory, Department of Ophthalmology, Columbia University, New York, NY 10032, USA
- Department of Ophthalmology, Federal University of São Paulo, São Paulo 04021-001, SP, Brazil
| | - Masha Kolesnikova
- Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center/New York-Presbyterian Hospital, New York, NY 10032, USA
- Jonas Children′s Vision Care, and Bernard & Shirlee Brown Glaucoma Laboratory, Department of Ophthalmology, Columbia University, New York, NY 10032, USA
- College of Medicine at the State University of New York at Downstate Medical Center, Brooklyn, NY 11203, USA
| | - Sarah R. Levi
- Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center/New York-Presbyterian Hospital, New York, NY 10032, USA
- Jonas Children′s Vision Care, and Bernard & Shirlee Brown Glaucoma Laboratory, Department of Ophthalmology, Columbia University, New York, NY 10032, USA
| | - Thiago Cabral
- Department of Ophthalmology, Federal University of São Paulo, São Paulo 04021-001, SP, Brazil
- Vision Center Unit/EBSERH and Department of Ophthalmology, Federal University of Espírito Santo, Vitória 29075-910, ES, Brazil
- Young Leadership Physicians Programme, National Academy of Medicine, Rio de Janeiro 20021-130, RJ, Brazil
| | - Stephen H. Tsang
- Department of Biomedical Engineering, Columbia University, New York, NY 10027, USA
- Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center/New York-Presbyterian Hospital, New York, NY 10032, USA
- Jonas Children′s Vision Care, and Bernard & Shirlee Brown Glaucoma Laboratory, Department of Ophthalmology, Columbia University, New York, NY 10032, USA
- Columbia Stem Cell Initiative, Columbia University, New York, NY 10032, USA
- Department of Pathology & Cell Biology, Columbia University, New York, NY 10032, USA
| | - Irene H. Maumenee
- Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center/New York-Presbyterian Hospital, New York, NY 10032, USA
- Jonas Children′s Vision Care, and Bernard & Shirlee Brown Glaucoma Laboratory, Department of Ophthalmology, Columbia University, New York, NY 10032, USA
| | - Peter M. J. Quinn
- Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center/New York-Presbyterian Hospital, New York, NY 10032, USA
- Jonas Children′s Vision Care, and Bernard & Shirlee Brown Glaucoma Laboratory, Department of Ophthalmology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
20
|
Machine learning approaches demonstrate that protein structures carry information about their genetic coding. Sci Rep 2022; 12:21968. [PMID: 36539476 PMCID: PMC9767929 DOI: 10.1038/s41598-022-25874-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
Synonymous codons translate into the same amino acid. Although the identity of synonymous codons is often considered inconsequential to the final protein structure, there is mounting evidence for an association between the two. Our study examined this association using regression and classification models, finding that codon sequences predict protein backbone dihedral angles with a lower error than amino acid sequences, and that models trained with true dihedral angles have better classification of synonymous codons given structural information than models trained with random dihedral angles. Using this classification approach, we investigated local codon-codon dependencies and tested whether synonymous codon identity can be predicted more accurately from codon context than amino acid context alone, and most specifically which codon context position carries the most predictive power.
Collapse
|
21
|
Local Backbone Geometry Plays a Critical Role in Determining Conformational Preferences of Amino Acid Residues in Proteins. Biomolecules 2022; 12:biom12091184. [PMID: 36139023 PMCID: PMC9496368 DOI: 10.3390/biom12091184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/17/2022] Open
Abstract
The definition of the structural basis of the conformational preferences of the genetically encoded amino acid residues is an important yet unresolved issue of structural biology. In order to gain insights into this intricate topic, we here determined and compared the amino acid propensity scales for different (φ, ψ) regions of the Ramachandran plot and for different secondary structure elements. These propensities were calculated using the Chou–Fasman approach on a database of non-redundant protein chains retrieved from the Protein Data Bank. Similarities between propensity scales were evaluated by linear regression analyses. One of the most striking and unexpected findings is that distant regions of the Ramachandran plot may exhibit significantly similar propensity scales. On the other hand, contiguous regions of the Ramachandran plot may present anticorrelated propensities. In order to provide an interpretative background to these results, we evaluated the role that the local variability of protein backbone geometry plays in this context. Our analysis indicates that (dis)similarities of propensity scales between different regions of the Ramachandran plot are coupled with (dis)similarities in the local geometry. The concept that similarities of the propensity scales are dictated by the similarity of the NCαC angle and not necessarily by the similarity of the (φ, ψ) conformation may have far-reaching implications in the field.
Collapse
|
22
|
Xu H. Non-Equilibrium Protein Folding and Activation by ATP-Driven Chaperones. Biomolecules 2022; 12:832. [PMID: 35740957 PMCID: PMC9221429 DOI: 10.3390/biom12060832] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/10/2022] [Accepted: 06/13/2022] [Indexed: 12/14/2022] Open
Abstract
Recent experimental studies suggest that ATP-driven molecular chaperones can stabilize protein substrates in their native structures out of thermal equilibrium. The mechanism of such non-equilibrium protein folding is an open question. Based on available structural and biochemical evidence, I propose here a unifying principle that underlies the conversion of chemical energy from ATP hydrolysis to the conformational free energy associated with protein folding and activation. I demonstrate that non-equilibrium folding requires the chaperones to break at least one of four symmetry conditions. The Hsp70 and Hsp90 chaperones each break a different subset of these symmetries and thus they use different mechanisms for non-equilibrium protein folding. I derive an upper bound on the non-equilibrium elevation of the native concentration, which implies that non-equilibrium folding only occurs in slow-folding proteins that adopt an unstable intermediate conformation in binding to ATP-driven chaperones. Contrary to the long-held view of Anfinsen's hypothesis that proteins fold to their conformational free energy minima, my results predict that some proteins may fold into thermodynamically unstable native structures with the assistance of ATP-driven chaperones, and that the native structures of some chaperone-dependent proteins may be shaped by their chaperone-mediated folding pathways.
Collapse
Affiliation(s)
- Huafeng Xu
- Roivant Sciences, New York, NY 10036, USA
| |
Collapse
|