1
|
Najar Najafi N, Karbassian R, Hajihassani H, Azimzadeh Irani M. Unveiling the influence of fastest nobel prize winner discovery: alphafold's algorithmic intelligence in medical sciences. J Mol Model 2025; 31:163. [PMID: 40387957 DOI: 10.1007/s00894-025-06392-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 05/06/2025] [Indexed: 05/20/2025]
Abstract
CONTEXT AlphaFold's advanced AI technology has transformed protein structure interpretation. By predicting three-dimensional protein structures from amino acid sequences, AlphaFold has solved the complex protein-folding problem, previously challenging for experimental methods due to numerous possible conformations. Since its inception, AlphaFold has introduced several versions, including AlphaFold2, AlphaFold DB, AlphaFold Multimer, Alpha Missense, and AlphaFold3, each further enhancing protein structure prediction. Remarkably, AlphaFold is recognized as the fastest Nobel Prize winner in science history. This technology has extensive applications, potentially transforming treatment and diagnosis in medical sciences by reducing drug design costs and time, while elucidating structural pathways of human body systems. Numerous studies have demonstrated how AlphaFold aids in understanding health conditions by providing critical information about protein mutations, abnormal protein-protein interactions, and changes in protein dynamics. Researchers have also developed new technologies and pipelines using different versions of AlphaFold to amplify its potential. However, addressing existing limitations is crucial to maximizing AlphaFold's capacity to redefine medical research. This article reviews AlphaFold's impact on five key aspects of medical sciences: protein mutation, protein-protein interaction, molecular dynamics, drug design, and immunotherapy. METHODS This review examines the contributions of various AlphaFold versions AlphaFold2, AlphaFold DB, AlphaFold Multimer, Alpha Missense, and AlphaFold3 to protein structure prediction. The methods include an extensive analysis of computational techniques and software used in interpreting and predicting protein structures, emphasizing advances in AI technology and its applications in medical research.
Collapse
Affiliation(s)
- Niki Najar Najafi
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Reyhaneh Karbassian
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Helia Hajihassani
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | | |
Collapse
|
2
|
Tekpinar M, David L, Henry T, Carbone A. PRESCOTT: a population aware, epistatic, and structural model accurately predicts missense effects. Genome Biol 2025; 26:113. [PMID: 40329382 PMCID: PMC12054230 DOI: 10.1186/s13059-025-03581-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 04/17/2025] [Indexed: 05/08/2025] Open
Abstract
Predicting the functional impact of point mutations is a critical challenge in genomics. PRESCOTT reconstructs complete mutational landscapes, identifies mutation-sensitive regions, and categorizes missense variants as benign, pathogenic, or variants of uncertain significance. Leveraging protein sequences, structural models, and population-specific allele frequencies, PRESCOTT surpasses existing methods in classifying ClinVar variants, the ACMG dataset, and over 1800 proteins from the Human Protein Dataset. Its online server facilitates mutation effect predictions for any protein and variant, and includes a database of over 19,000 human proteins, ready for population-specific analyses. Open access to residue-specific scores offers transparency and valuable insights for genomic medicine.
Collapse
Affiliation(s)
- Mustafa Tekpinar
- Department of Computational, Quantitative and Synthetic Biology (CQSB), Sorbonne Université, CNRS, IBPS, UMR 7238, Paris, 75005, France
| | - Laurent David
- Department of Computational, Quantitative and Synthetic Biology (CQSB), Sorbonne Université, CNRS, IBPS, UMR 7238, Paris, 75005, France
| | - Thomas Henry
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, Univ Lyon, Lyon, 69007, France
| | - Alessandra Carbone
- Department of Computational, Quantitative and Synthetic Biology (CQSB), Sorbonne Université, CNRS, IBPS, UMR 7238, Paris, 75005, France.
- Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
3
|
Fesenko I, Sahakyan H, Dhyani R, Shabalina SA, Storz G, Koonin EV. The hidden bacterial microproteome. Mol Cell 2025; 85:1024-1041.e6. [PMID: 39978337 PMCID: PMC11890958 DOI: 10.1016/j.molcel.2025.01.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 11/05/2024] [Accepted: 01/22/2025] [Indexed: 02/22/2025]
Abstract
Microproteins encoded by small open reading frames comprise the "dark matter" of proteomes. Although microproteins have been detected in diverse organisms from all three domains of life, many more remain to be identified, and only a few have been functionally characterized. In this comprehensive study of intergenic small open reading frames (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identify 67,297 clusters of ismORFs subject to purifying selection. Expression of tagged Escherichia coli microproteins is detected for 11 of the 16 tested, validating the predictions. Although the ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins, some globular folds, oligomeric structures, and possible interactions with proteins encoded by neighboring genes are predicted. Complete information on the predicted microprotein families, including evidence of transcription and translation, and structure predictions are available as an easily searchable resource for investigation of microprotein functions.
Collapse
Affiliation(s)
- Igor Fesenko
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Harutyun Sahakyan
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rajat Dhyani
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - Svetlana A Shabalina
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Gisela Storz
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Eugene V Koonin
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
4
|
Mutz P, Camargo AP, Sahakyan H, Neri U, Butkovic A, Wolf YI, Krupovic M, Dolja VV, Koonin EV. The protein structurome of Orthornavirae and its dark matter. mBio 2025; 16:e0320024. [PMID: 39714180 PMCID: PMC11796362 DOI: 10.1128/mbio.03200-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 10/28/2024] [Indexed: 12/24/2024] Open
Abstract
Metatranscriptomics is uncovering more and more diverse families of viruses with RNA genomes comprising the viral kingdom Orthornavirae in the realm Riboviria. Thorough protein annotation and comparison are essential to get insights into the functions of viral proteins and virus evolution. In addition to sequence- and hmm profile‑based methods, protein structure comparison adds a powerful tool to uncover protein functions and relationships. We constructed an Orthornavirae "structurome" consisting of already annotated as well as unannotated ("dark matter") proteins and domains encoded in viral genomes. We used protein structure modeling and similarity searches to illuminate the remaining dark matter in hundreds of thousands of orthornavirus genomes. The vast majority of the dark matter domains showed either "generic" folds, such as single α-helices, or no high confidence structure predictions. Nevertheless, a variety of lineage-specific globular domains that were new either to orthornaviruses in general or to particular virus families were identified within the proteomic dark matter of orthornaviruses, including several predicted nucleic acid-binding domains and nucleases. In addition, we identified a case of exaptation of a cellular nucleoside monophosphate kinase as an RNA-binding protein in several virus families. Notwithstanding the continuing discovery of numerous orthornaviruses, it appears that all the protein domains conserved in large groups of viruses have already been identified. The rest of the viral proteome seems to be dominated by poorly structured domains including intrinsically disordered ones that likely mediate specific virus-host interactions. IMPORTANCE Advanced methods for protein structure prediction, such as AlphaFold2, greatly expand our capability to identify protein domains and infer their likely functions and evolutionary relationships. This is particularly pertinent for proteins encoded by viruses that are known to evolve rapidly and as a result often cannot be adequately characterized by analysis of the protein sequences. We performed an exhaustive structure prediction and comparative analysis for uncharacterized proteins and domains ("dark matter") encoded by viruses with RNA genomes. The results show the dark matter of RNA virus proteome consists mostly of disordered and all-α-helical domains that cannot be readily assigned a specific function and that likely mediate various interactions between viral proteins and between viral and host proteins. The great majority of globular proteins and domains of RNA viruses are already known although we identified several unexpected domains represented in individual viral families.
Collapse
Affiliation(s)
- Pascal Mutz
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Antonio Pedro Camargo
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Harutyun Sahakyan
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Uri Neri
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Anamarija Butkovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Yuri I. Wolf
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Valerian V. Dolja
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Eugene V. Koonin
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
5
|
Xie X, Deng X, Chen L, Yuan J, Chen H, Wei C, Feng C, Liu X, Qiu G. From Gene to Structure: Unraveling Genomic Dark Matter in Ca. Accumulibacter. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:628-639. [PMID: 39699575 DOI: 10.1021/acs.est.4c09948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2024]
Abstract
"Candidatus Accumulibacter" is a unique and pivotal genus of polyphosphate-accumulating organisms prevalent in wastewater treatment plants and plays mainstay roles in the global phosphorus cycle. However, the efforts to fully understand their genetic and metabolic characteristics are largely hindered by major limitations in existing sequence-based annotation methods. Here, we reported an integrated approach combining pangenome analysis, protein structure prediction and clustering, and meta-omic characterization, to uncover genetic and metabolic traits previously unexplored for Ca. Accumulibacter. The identification of a previously overlooked pyrophosphate-fructose 6-phosphate 1-phosphotransferase gene (pfp) suggested that all Ca. Accumulibacter encoded a complete Embden-Meyerhof-Parnas pathway. A homologue of the phosphate-specific transport system accessory protein (PhoU) was suggested to be an inorganic phosphate transport (Pit) accessory protein (Pap) conferring effective and efficient phosphate transport. Additional lineage members were found to encode complete denitrification pathways. A pipeline was built, generating a pan-Ca. Accumulibacter annotation reference database, covering >200,000 proteins and their encoding genes. Benchmarking on 27 Ca. Accumulibacter genomes showed major improvement in the average annotation coverage from 51% to 82%. This pipeline is readily applicable to diverse cultured and uncultured bacteria to establish high-coverage annotation reference databases, facilitating the exploration of genomic dark matter in the bacterial domain.
Collapse
Affiliation(s)
- Xiaojing Xie
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Xuhan Deng
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Liping Chen
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Jing Yuan
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Hang Chen
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
| | - Chaohai Wei
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
- Guangdong Provincial Key Laboratory of Solid Wastes Pollution Control and Recycling, Guangzhou 510006, China
| | - Chunhua Feng
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
- Guangdong Provincial Key Laboratory of Solid Wastes Pollution Control and Recycling, Guangzhou 510006, China
| | - Xianghui Liu
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore 637551, Singapore
| | - Guanglei Qiu
- School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore 637551, Singapore
- Guangdong Provincial Key Laboratory of Solid Wastes Pollution Control and Recycling, Guangzhou 510006, China
- The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, Guangzhou 510006, China
| |
Collapse
|
6
|
Shi C, Tang Z, Jin Z, Huang S, Xu X, Qu C, Lin TH. Characterization of DmToll and DmToll7 homologue in Litopenaeus vannamei based on structure analysis. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2024; 158:105209. [PMID: 38838948 DOI: 10.1016/j.dci.2024.105209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 06/01/2024] [Accepted: 06/03/2024] [Indexed: 06/07/2024]
Abstract
Toll-like receptors (TLRs) are a family of pattern recognition receptors (PRRs) that recognize invading pathogens and activate downstream signaling pathways. The number of 10 Tolls is found in Litopenaeus vannamei but have not yet been identified as the corresponding Toll homologue of model animal. In this study, we predicted the three-dimensional (3D) structures of 10 LvTolls (LvToll1-10) with AlphaFold2 program. The per-residue local distance difference test (pLDDT) scores of LvTolls showed the predicted structure of LvTolls had high accuracy (pLDDT>70). By structural analysis, 3D structures of LvToll2 and LvToll3 had high similarity with Drosophila melanogaster Toll and Toll7, respectively. 3D structure of LvToll7 and LvToll10 were not similar to that of other LvTolls. Moreover, we also predicted that LvSpätzle4 had high structural similarity to DmSpätzle. There were 9 potential hydrogen bonds in LvToll2-LvSpätzle4 complex. Importantly, co-immunoprecipitation assay showed that LvToll2 could bind with LvSpätzle4. Collectively, this study provides new insight for researching invertebrate immunity by identifying the protein of model animal homologue.
Collapse
Affiliation(s)
- Chenchen Shi
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Zhuyun Tang
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China; National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Zhixin Jin
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China
| | - Shan Huang
- Fujian Provincial Key Laboratory of Functional and Clinical Translational Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China; Department of Basic Medical Science, Xiamen Medical College, Xiamen, Fujian, 361023, China
| | - Xiuyue Xu
- Fujian Provincial Key Laboratory of Functional and Clinical Translational Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China; Department of Clinical Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China
| | - Chen Qu
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China.
| | - Ta-Hui Lin
- Fujian Provincial Key Laboratory of Functional and Clinical Translational Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China; State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China; Department of Basic Medical Science, Xiamen Medical College, Xiamen, Fujian, 361023, China.
| |
Collapse
|
7
|
Shi C, Jin Z, Yu Y, Tang Z, Zhang Y, Qu C, Lin TH. Identification and characterization of a TLR4 homologue in Eriocheir sinensis based on structure analysis. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2024; 157:105192. [PMID: 38714270 DOI: 10.1016/j.dci.2024.105192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 05/03/2024] [Accepted: 05/05/2024] [Indexed: 05/09/2024]
Abstract
Toll-like receptor 4 (TLR4) plays an essential role in the activation of innate immunity by recognizing diverse pathogenic components of bacteria. Six Tolls were found in Eriocheir sinensis but have not yet been identified as mammalian TLR4 homolog. For this purpose, we predicted three-dimensional (3D) structures of EsTolls (EsToll1-6) with AlphaFold2. 3D structure of LRRs and TIR most had high accuracy (pLDDT >70). By structure analysis, 3D structures of EsToll6 had a high overlap with HsTLR4. Moreover, we also predicted potential 11 hydrogen bonds and 3 salt bridges in the 3D structure of EsToll6-EsML1 complex. 18 hydrogen bonds and 7 salt bridges were predicted in EsToll6-EsML2 complex. Co-immunoprecipitation assay showed that EsToll6 could interact with EsML1 and EsML2, respectively. Importantly, TAK242 (a mammalian TLR4-specific inhibitor) could inhibit the generation of ROS stimulated by lipopolysaccharides (LPS) in EsToll6-EsML2-overexpression Hela cells. Collectively, these results implied that EsToll6 was a mammalian TLR4 homolog and provided a new insight for researching mammalian homologs in invertebrates.
Collapse
Affiliation(s)
- Chenchen Shi
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China
| | - Zhixin Jin
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China
| | - Yanping Yu
- Fujian Provincial Key Laboratory of Functional and Clinical Translational Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China; Department of Basic Medical Science, Xiamen Medical College, Xiamen, Fujian, 361023, China
| | - Zhuyun Tang
- National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Yuguo Zhang
- Fujian Provincial Key Laboratory of Functional and Clinical Translational Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China; Department of Basic Medical Science, Xiamen Medical College, Xiamen, Fujian, 361023, China
| | - Chen Qu
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, China.
| | - Ta-Hui Lin
- Fujian Provincial Key Laboratory of Functional and Clinical Translational Medicine, Xiamen Medical College, Xiamen, Fujian, 361023, China; State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, Fujian, 361102, China; Department of Basic Medical Science, Xiamen Medical College, Xiamen, Fujian, 361023, China.
| |
Collapse
|
8
|
Agarwal V, McShan AC. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat Chem Biol 2024; 20:950-959. [PMID: 38907110 PMCID: PMC11956457 DOI: 10.1038/s41589-024-01638-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 04/29/2024] [Indexed: 06/23/2024]
Abstract
Artificial intelligence-driven advances in protein structure prediction in recent years have raised the question: has the protein structure-prediction problem been solved? Here, with a focus on nonglobular proteins, we highlight the many strengths and potential weaknesses of DeepMind's AlphaFold2 in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AlphaFold2 model quality and reliability using the predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AlphaFold2 can be applied to and the caveats involved. Concrete examples of how AlphaFold2 models can be integrated with experimental data in the form of small-angle X-ray scattering (SAXS), solution NMR, cryo-electron microscopy (cryo-EM) and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots toward conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AlphaFold2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures.
Collapse
Affiliation(s)
- Vinayak Agarwal
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
9
|
Haj Abdullah Alieh L, Cardoso de Toledo B, Hadarovich A, Toth-Petroczy A, Calegari F. Characterization of alternative splicing during mammalian brain development reveals the extent of isoform diversity and potential effects on protein structural changes. Biol Open 2024; 13:bio061721. [PMID: 39387301 PMCID: PMC11554263 DOI: 10.1242/bio.061721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 09/09/2024] [Indexed: 10/15/2024] Open
Abstract
Regulation of gene expression is critical for fate commitment of stem and progenitor cells during tissue formation. In the context of mammalian brain development, a plethora of studies have described how changes in the expression of individual genes characterize cell types across ontogeny and phylogeny. However, little attention has been paid to the fact that different transcripts can arise from any given gene through alternative splicing (AS). Considered a key mechanism expanding transcriptome diversity during evolution, assessing the full potential of AS on isoform diversity and protein function has been notoriously difficult. Here, we capitalize on the use of a validated reporter mouse line to isolate neural stem cells, neurogenic progenitors and neurons during corticogenesis and combine the use of short- and long-read sequencing to reconstruct the full transcriptome diversity characterizing neurogenic commitment. Extending available transcriptional profiles of the mammalian brain by nearly 50,000 new isoforms, we found that neurogenic commitment is characterized by a progressive increase in exon inclusion resulting in the profound remodeling of the transcriptional profile of specific cortical cell types. Most importantly, we computationally infer the biological significance of AS on protein structure by using AlphaFold2, revealing how radical protein conformational changes can arise from subtle changes in isoforms sequence. Together, our study reveals that AS has a greater potential to impact protein diversity and function than previously thought, independently from changes in gene expression.
Collapse
Affiliation(s)
| | | | - Anna Hadarovich
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
| | - Federico Calegari
- CRTD-Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Germany
| |
Collapse
|
10
|
Carugo O. Accuracy of AlphaFold models: Comparison with short N …O contacts in atomic resolution protein crystal structures. Comput Biol Chem 2024; 110:108069. [PMID: 38581839 DOI: 10.1016/j.compbiolchem.2024.108069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/29/2024] [Accepted: 04/04/2024] [Indexed: 04/08/2024]
Abstract
Artificial intelligence (AI) has revolutionized structural biology by predicting protein 3D structures with near-experimental accuracy. Here, short backbone N-O distances in high-resolution crystal structures were compared to those in three-dimensional models based on AI AlphaFold/ColabFold, specifically considering their estimated standard errors. Experimental and computationally modeled distances very often differ significantly, showing that these models' precision is inadequate to reproduce experimental results at high resolution. T-tests and normal probability plots showed that these computational methods predict atomic position standard errors 3.5-6 times bigger than experimental errors. SYNOPSIS: Positional standard errors in AI-based protein 3D models are 3.5-6 times larger than in atomic resolution crystal structures.
Collapse
Affiliation(s)
- Oliviero Carugo
- Department of Chemistry, University of Pavia, Pavia, Italy; Max Perutz Labs University of Vienna, Department of Structural and Computational Biology, Vienna, Austria.
| |
Collapse
|
11
|
Tian Z, Jiang X, Chen Z, Huang C, Qian F. Quantifying Protein Shape to Elucidate Its Influence on Solution Viscosity in High-Concentration Electrolyte Solutions. Mol Pharm 2024; 21:1719-1728. [PMID: 38411904 DOI: 10.1021/acs.molpharmaceut.3c01075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Therapeutic proteins with a high concentration and low viscosity are highly desirable for subcutaneous and certain local injections. The shape of a protein is known to influence solution viscosity; however, the precise quantification of protein shape and its relative impact compared to other factors like charge-charge interactions remains unclear. In this study, we utilized seven model proteins of varying shapes and experimentally determined their shape factors (v) based on Einstein's viscosity theory, which correlate strongly with the ratios of the proteins' surface area to the 2/3 power of their respective volumes, based on protein crystal structures resolved experimentally or predicted by AlphaFold. This finding confirms the feasibility of computationally estimating protein shape factors from amino acid sequences alone. Furthermore, our results demonstrated that, in high-concentration electrolyte solutions, a more spherical protein shape increases the protein's critical concentration (C*), the transition concentration beyond which protein viscosity increases exponentially relative to concentration increases. In summary, our work elucidates protein shape as a key determinant of solution viscosity through quantitative analysis and comparison with other contributing factors. This provides insights into molecular engineering strategies to optimize the molecular design of therapeutic proteins, thus optimizing their viscosity.
Collapse
Affiliation(s)
- Zhou Tian
- School of Pharmaceutical Sciences, Beijing Frontier Research Center for Biological Structure, and Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Tsinghua University, Beijing 100084, P. R. China
| | - Xuling Jiang
- School of Pharmaceutical Sciences, Beijing Frontier Research Center for Biological Structure, and Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Tsinghua University, Beijing 100084, P. R. China
| | - Zhidong Chen
- School of Pharmaceutical Sciences, Beijing Frontier Research Center for Biological Structure, and Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Tsinghua University, Beijing 100084, P. R. China
| | - Chengnan Huang
- School of Pharmaceutical Sciences, Beijing Frontier Research Center for Biological Structure, and Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Tsinghua University, Beijing 100084, P. R. China
| | - Feng Qian
- School of Pharmaceutical Sciences, Beijing Frontier Research Center for Biological Structure, and Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology (Ministry of Education), Tsinghua University, Beijing 100084, P. R. China
| |
Collapse
|
12
|
Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
Affiliation(s)
- Micholas Dean Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| | - L Darryl Quarles
- Departments of Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA; ORRxD LLC, 3404 Olney Drive, Durham, NC 27705, USA
| | - Omar Demerdash
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - Jeremy C Smith
- University of Tennessee/Oak Ridge National Laboratory Center for Molecular Biophysics, Oak Ridge, TN 37830, USA; Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA.
| |
Collapse
|
13
|
Taujale R, Gravel N, Zhou Z, Yeung W, Kochut K, Kannan N. Informatic challenges and advances in illuminating the druggable proteome. Drug Discov Today 2024; 29:103894. [PMID: 38266979 DOI: 10.1016/j.drudis.2024.103894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/08/2024] [Accepted: 01/17/2024] [Indexed: 01/26/2024]
Abstract
The understudied members of the druggable proteomes offer promising prospects for drug discovery efforts. While large-scale initiatives have generated valuable functional information on understudied members of the druggable gene families, translating this information into actionable knowledge for drug discovery requires specialized informatics tools and resources. Here, we review the unique informatics challenges and advances in annotating understudied members of the druggable proteome. We demonstrate the application of statistical evolutionary inference tools, knowledge graph mining approaches, and protein language models in illuminating understudied protein kinases, pseudokinases, and ion channels.
Collapse
Affiliation(s)
- Rahil Taujale
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | | | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Krystof Kochut
- School of Computing, University of Georgia, Athens, GA, USA
| | - Natarajan Kannan
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA; Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
| |
Collapse
|
14
|
Corum MR, Venkannagari H, Hryc CF, Baker ML. Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
Affiliation(s)
- Michael R Corum
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Harikanth Venkannagari
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Corey F Hryc
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas
| | - Matthew L Baker
- Department of Biochemistry and Molecular Biology, McGovern Medical School at the University of Texas Health Science Center, Houston, Texas.
| |
Collapse
|
15
|
Fesenko I, Sahakyan H, Shabalina SA, Koonin EV. The Cryptic Bacterial Microproteome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.17.580829. [PMID: 38903115 PMCID: PMC11188072 DOI: 10.1101/2024.02.17.580829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Microproteins encoded by small open reading frames (smORFs) comprise the "dark matter" of proteomes. Although functional microproteins were identified in diverse organisms from all three domains of life, bacterial smORFs remain poorly characterized. In this comprehensive study of intergenic smORFs (ismORFs, 15-70 codons) in 5,668 bacterial genomes of the family Enterobacteriaceae, we identified 67,297 clusters of ismORFs subject to purifying selection. The ismORFs mainly code for hydrophobic, potentially transmembrane, unstructured, or minimally structured microproteins. Using AlphaFold Multimer, we predicted interactions of some of the predicted microproteins encoded by transcribed ismORFs with proteins encoded by neighboring genes, revealing the potential of microproteins to regulate the activity of various proteins, particularly, under stress. We compiled a catalog of predicted microprotein families with different levels of evidence from synteny analysis, structure prediction, and transcription and translation data. This study offers a resource for investigation of biological functions of microproteins.
Collapse
Affiliation(s)
- Igor Fesenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Harutyun Sahakyan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Svetlana A. Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
16
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
17
|
Sim PF, Chek MF, Nguyen NTH, Nishimura T, Inaba T, Hakoshima T, Suetsugu S. The SH3 binding site in front of the WH1 domain contributes to the membrane binding of the BAR domain protein endophilin A2. J Biochem 2023; 175:57-67. [PMID: 37812440 DOI: 10.1093/jb/mvad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 08/22/2023] [Accepted: 08/31/2023] [Indexed: 10/10/2023] Open
Abstract
The Bin-Amphiphysin-Rvs (BAR) domain of endophilin binds to the cell membrane and shapes it into a tubular shape for endocytosis. Endophilin has a Src-homology 3 (SH3) domain at their C-terminal. The SH3 domain interacts with the proline-rich motif (PRM) that is found in proteins such as neural Wiskott-Aldrich syndrome protein (N-WASP). Here, we re-examined the binding sites of the SH3 domain of endophilin in N-WASP by machine learning-based prediction and identified the previously unrecognized binding site. In addition to the well-recognized PRM at the central proline-rich region, we found a PRM in front of the N-terminal WASP homology 1 (WH1) domain of N-WASP (NtPRM) as a binding site of the endophilin SH3 domain. Furthermore, the diameter of the membrane tubules in the presence of NtPRM mutant was narrower and wider than that in the presence of N-WASP and in its absence, respectively. Importantly, the NtPRM of N-WASP was involved in the membrane localization of endophilin A2 in cells. Therefore, the NtPRM contributes to the binding of endophilin to N-WASP in membrane remodeling.
Collapse
Affiliation(s)
- Pei Fang Sim
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Min Fey Chek
- Institute for Research Initiatives, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Nhung Thi Hong Nguyen
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Tamako Nishimura
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Takehiko Inaba
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Toshio Hakoshima
- Institute for Research Initiatives, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| | - Shiro Suetsugu
- Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
- Center for Digital Green-innovation, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
18
|
Hsiung SY, Deng SX, Li J, Huang SY, Liaw CK, Huang SY, Wang CC, Hsieh YSY. Machine learning-based monosaccharide profiling for tissue-specific classification of Wolfiporia extensa samples. Carbohydr Polym 2023; 322:121338. [PMID: 37839831 DOI: 10.1016/j.carbpol.2023.121338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 08/09/2023] [Accepted: 08/26/2023] [Indexed: 10/17/2023]
Abstract
Machine learning (ML) has been used for many clinical decision-making processes and diagnostic procedures in bioinformatics applications. We examined eight algorithms, including linear discriminant analysis (LDA), logistic regression (LR), k-nearest neighbor (KNN), random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), Naïve Bayes classifier (NB), and artificial neural network (ANN) models, to evaluate their classification and prediction capabilities for four tissue types in Wolfiporia extensa using their monosaccharide composition profiles. All 8 ML-based models were assessed as exemplary models with AUC exceeding 0.8. Five models, namely LDA, KNN, RF, GBM, and ANN, performed excellently in the four-tissue-type classification (AUC > 0.9). Additionally, all eight models were evaluated as good predictive models with AUC value > 0.8 in the three-tissue-type classification. Notably, all 8 ML-based methods outperformed the single linear discriminant analysis (LDA) plotting method. For large sample sizes, the ML-based methods perform better than traditional regression techniques and could potentially increase the accuracy in identifying tissue samples of W. extensa.
Collapse
Affiliation(s)
- Shih-Yi Hsiung
- School of Pharmacy, College of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Shun-Xin Deng
- School of Pharmacy, College of Pharmacy, Taipei Medical University, Taipei, Taiwan; Graduate Institute of Pharmacognosy, Taipei Medical University, Taipei, Taiwan
| | - Jing Li
- College of Life Science, Shanghai Normal University, Shanghai, China
| | - Sheng-Yao Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Chen-Kun Liaw
- Department of Orthopedics, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Su-Yun Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Ching-Chiung Wang
- School of Pharmacy, College of Pharmacy, Taipei Medical University, Taipei, Taiwan; Graduate Institute of Pharmacognosy, Taipei Medical University, Taipei, Taiwan
| | - Yves S Y Hsieh
- School of Pharmacy, College of Pharmacy, Taipei Medical University, Taipei, Taiwan; Graduate Institute of Pharmacognosy, Taipei Medical University, Taipei, Taiwan; Division of Glycoscience, Department of Chemistry, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, AlbaNova University Centre, Stockholm SE106 91, Sweden.
| |
Collapse
|
19
|
Al-Masri C, Trozzi F, Lin SH, Tran O, Sahni N, Patek M, Cichonska A, Ravikumar B, Rahman R. Investigating the conformational landscape of AlphaFold2-predicted protein kinase structures. BIOINFORMATICS ADVANCES 2023; 3:vbad129. [PMID: 37786533 PMCID: PMC10541651 DOI: 10.1093/bioadv/vbad129] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/28/2023] [Accepted: 09/13/2023] [Indexed: 10/04/2023]
Abstract
Summary Protein kinases are a family of signaling proteins, crucial for maintaining cellular homeostasis. When dysregulated, kinases drive the pathogenesis of several diseases, and are thus one of the largest target categories for drug discovery. Kinase activity is tightly controlled by switching through several active and inactive conformations in their catalytic domain. Kinase inhibitors have been designed to engage kinases in specific conformational states, where each conformation presents a unique physico-chemical environment for therapeutic intervention. Thus, modeling kinases across conformations can enable the design of novel and optimally selective kinase drugs. Due to the recent success of AlphaFold2 in accurately predicting the 3D structure of proteins based on sequence, we investigated the conformational landscape of protein kinases as modeled by AlphaFold2. We observed that AlphaFold2 is able to model several kinase conformations across the kinome, however, certain conformations are only observed in specific kinase families. Furthermore, we show that the per residue predicted local distance difference test can capture information describing structural flexibility of kinases. Finally, we evaluated the docking performance of AlphaFold2 kinase structures for enriching known ligands. Taken together, we see an opportunity to leverage AlphaFold2 models for structure-based drug discovery against kinases across several pharmacologically relevant conformational states. Availability and implementation All code used in the analysis is freely available at https://github.com/Harmonic-Discovery/AF2-kinase-conformational-landscape.
Collapse
Affiliation(s)
- Carmen Al-Masri
- Harmonic Discovery Inc., New York, NY 10013, United States
- Department of Physics and Astronomy, University of California Irvine, Irvine, CA 92697, United States
| | | | - Shu-Hang Lin
- Harmonic Discovery Inc., New York, NY 10013, United States
- Department of Chemical Engineering, University of Michigan Ann Arbor, Ann Arbor, MI 48109, United States
| | - Oanh Tran
- Harmonic Discovery Inc., New York, NY 10013, United States
- Department of Chemistry, University of California Irvine, Irvine, CA 92697, United States
| | - Navriti Sahni
- Harmonic Discovery Inc., New York, NY 10013, United States
| | - Marcel Patek
- Harmonic Discovery Inc., New York, NY 10013, United States
| | - Anna Cichonska
- Harmonic Discovery Inc., New York, NY 10013, United States
| | | | - Rayees Rahman
- Harmonic Discovery Inc., New York, NY 10013, United States
| |
Collapse
|
20
|
Varadi M, Velankar S. The impact of AlphaFold Protein Structure Database on the fields of life sciences. Proteomics 2023; 23:e2200128. [PMID: 36382391 DOI: 10.1002/pmic.202200128] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 09/06/2023]
Abstract
Arguably, 2020 was the year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving previously unseen accuracy in the Critical Assessment of Protein Structure Prediction (CASP). In 2021, DeepMind and EMBL-EBI developed the AlphaFold Protein Structure Database to make an unprecedented number of reliable protein structure predictions easily accessible to the broad scientific community. We provide a brief overview and describe the latest developments in the AlphaFold database. We highlight how the fields of data services, bioinformatics, structural biology, and drug discovery are directly affected by the influx of protein structure data. We also show examples of cutting-edge research that took advantage of the AlphaFold database. It is apparent that connections between various fields through protein structures are now possible, but the amount of data poses new challenges. Finally, we give an outlook regarding the future direction of the database, both in terms of data sets and new functionalities.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
21
|
Gordon CH, Hendrix E, He Y, Walker MC. AlphaFold Accurately Predicts the Structure of Ribosomally Synthesized and Post-Translationally Modified Peptide Biosynthetic Enzymes. Biomolecules 2023; 13:1243. [PMID: 37627309 PMCID: PMC10452190 DOI: 10.3390/biom13081243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/08/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023] Open
Abstract
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a growing class of natural products biosynthesized from a genetically encoded precursor peptide. The enzymes that install the post-translational modifications on these peptides have the potential to be useful catalysts in the production of natural-product-like compounds and can install non-proteogenic amino acids in peptides and proteins. However, engineering these enzymes has been somewhat limited, due in part to limited structural information on enzymes in the same families that nonetheless exhibit different substrate selectivities. Despite AlphaFold2's superior performance in single-chain protein structure prediction, its multimer version lacks accuracy and requires high-end GPUs, which are not typically available to most research groups. Additionally, the default parameters of AlphaFold2 may not be optimal for predicting complex structures like RiPP biosynthetic enzymes, due to their dynamic binding and substrate-modifying mechanisms. This study assessed the efficacy of the structure prediction program ColabFold (a variant of AlphaFold2) in modeling RiPP biosynthetic enzymes in both monomeric and dimeric forms. After extensive benchmarking, it was found that there were no statistically significant differences in the accuracy of the predicted structures, regardless of the various possible prediction parameters that were examined, and that with the default parameters, ColabFold was able to produce accurate models. We then generated additional structural predictions for select RiPP biosynthetic enzymes from multiple protein families and biosynthetic pathways. Our findings can serve as a reference for future enzyme engineering complemented by AlphaFold-related tools.
Collapse
Affiliation(s)
| | | | | | - Mark C. Walker
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
22
|
Tam C, Iwasaki W. AlphaCutter: Efficient removal of non-globular regions from predicted protein structures. Proteomics 2023; 23:e2300176. [PMID: 37309722 DOI: 10.1002/pmic.202300176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/14/2023]
Abstract
A huge number of high-quality predicted protein structures are now publicly available. However, many of these structures contain non-globular regions, which diminish the performance of downstream structural bioinformatic applications. In this study, we develop AlphaCutter for the removal of non-globular regions from predicted protein structures. A large-scale cleaning of 542,380 predicted SwissProt structures highlights that AlphaCutter is able to (1) remove non-globular regions that are undetectable using pLDDT scores and (2) preserve high integrity of the cleaned domain regions. As useful applications, AlphaCutter improved the folding energy scores and sequence recovery rates in the re-design of domain regions. On average, AlphaCutter takes less than 3 s to clean a protein structure, enabling efficient cleaning of the exploding number of predicted protein structures. AlphaCutter is available at https://github.com/johnnytam100/AlphaCutter. AlphaCutter-cleaned SwissProt structures are available for download at https://doi.org/10.5281/zenodo.7944483.
Collapse
Affiliation(s)
- Chunlai Tam
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Chiba, Japan
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Chiba, Japan
| |
Collapse
|
23
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, allosteric, and orthosteric drug discovery: Ways forward. Drug Discov Today 2023; 28:103551. [PMID: 36907321 PMCID: PMC10238671 DOI: 10.1016/j.drudis.2023.103551] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 02/27/2023] [Accepted: 03/07/2023] [Indexed: 03/13/2023]
Abstract
Drug discovery is arguably a highly challenging and significant interdisciplinary aim. The stunning success of the artificial intelligence-powered AlphaFold, whose latest version is buttressed by an innovative machine-learning approach that integrates physical and biological knowledge about protein structures, raised drug discovery hopes that unsurprisingly, have not come to bear. Even though accurate, the models are rigid, including the drug pockets. AlphaFold's mixed performance poses the question of how its power can be harnessed in drug discovery. Here we discuss possible ways of going forward wielding its strengths, while bearing in mind what AlphaFold can and cannot do. For kinases and receptors, an input enriched in active (ON) state models can better AlphaFold's chance of rational drug design success.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| |
Collapse
|
24
|
Pierce MR, Hougland JL. A rising tide lifts all MBOATs: recent progress in structural and functional understanding of membrane bound O-acyltransferases. Front Physiol 2023; 14:1167873. [PMID: 37250116 PMCID: PMC10213974 DOI: 10.3389/fphys.2023.1167873] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 04/19/2023] [Indexed: 05/31/2023] Open
Abstract
Acylation modifications play a central role in biological and physiological processes. Across a range of biomolecules from phospholipids to triglycerides to proteins, introduction of a hydrophobic acyl chain can dramatically alter the biological function and cellular localization of these substrates. Amongst the enzymes catalyzing these modifications, the membrane bound O-acyltransferase (MBOAT) family occupies an intriguing position as the combined substrate selectivities of the various family members span all three classes of these biomolecules. MBOAT-dependent substrates are linked to a wide range of health conditions including metabolic disease, cancer, and neurodegenerative disease. Like many integral membrane proteins, these enzymes have presented challenges to investigation due to their intractability to solubilization and purification. However, over the last several years new solubilization approaches coupled with computational modeling, crystallography, and cryoelectron microscopy have brought an explosion of structural information for multiple MBOAT family members. These studies enable comparison of MBOAT structure and function across members catalyzing modifications of all three substrate classes, revealing both conserved features amongst all MBOATs and distinct architectural features that correlate with different acylation substrates ranging from lipids to proteins. We discuss the methods that led to this renaissance of MBOAT structural investigations, our new understanding of MBOAT structure and implications for catalytic function, and the potential impact of these studies for development of new therapeutics targeting MBOAT-dependent physiological processes.
Collapse
Affiliation(s)
- Mariah R. Pierce
- Department of Chemistry, Syracuse University, Syracuse, NY, United States
| | - James L. Hougland
- Department of Chemistry, Syracuse University, Syracuse, NY, United States
- Department of Biology, Syracuse University, Syracuse, NY, United States
- BioInspired Syracuse, Syracuse University, Syracuse, NY, United States
| |
Collapse
|
25
|
Dahl L, Kotliar IB, Bendes A, Dodig-Crnković T, Fromm S, Elofsson A, Uhlén M, Sakmar TP, Schwenk JM. Multiplexed selectivity screening of anti-GPCR antibodies. SCIENCE ADVANCES 2023; 9:eadf9297. [PMID: 37134173 PMCID: PMC10156119 DOI: 10.1126/sciadv.adf9297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 03/31/2023] [Indexed: 05/05/2023]
Abstract
G protein-coupled receptors (GPCRs) control critical cellular signaling pathways. Therapeutic agents including anti-GPCR antibodies (Abs) are being developed to modulate GPCR function. However, validating the selectivity of anti-GPCR Abs is challenging because of sequence similarities among individual receptors within GPCR subfamilies. To address this challenge, we developed a multiplexed immunoassay to test >400 anti-GPCR Abs from the Human Protein Atlas targeting a customized library of 215 expressed and solubilized GPCRs representing all GPCR subfamilies. We found that ~61% of Abs tested were selective for their intended target, ~11% bound off-target, and ~28% did not bind to any GPCR. Antigens of on-target Abs were, on average, significantly longer, more disordered, and less likely to be buried in the interior of the GPCR protein than the other Abs. These results provide important insights into the immunogenicity of GPCR epitopes and form a basis for designing therapeutic Abs and for detecting pathological auto-Abs against GPCRs.
Collapse
Affiliation(s)
- Leo Dahl
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 171 65 Solna, Sweden
| | - Ilana B. Kotliar
- Laboratory of Chemical Biology and Signal Transduction, The Rockefeller University, 1230 York Ave., New York, NY 10065, USA
- Tri-Institutional PhD Program in Chemical Biology, New York, NY 10065, USA
| | - Annika Bendes
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 171 65 Solna, Sweden
| | - Tea Dodig-Crnković
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 171 65 Solna, Sweden
| | - Samuel Fromm
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 65 Solna, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 65 Solna, Sweden
| | - Mathias Uhlén
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 171 65 Solna, Sweden
| | - Thomas P. Sakmar
- Laboratory of Chemical Biology and Signal Transduction, The Rockefeller University, 1230 York Ave., New York, NY 10065, USA
- Department of Neurobiology, Care Sciences and Society, Division of Neurogeriatrics, Karolinska Institutet, 171 64 Solna, Sweden
| | - Jochen M. Schwenk
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 171 65 Solna, Sweden
| |
Collapse
|
26
|
Mutz P, Resch W, Faure G, Senkevich TG, Koonin EV, Moss B. Exaptation of Inactivated Host Enzymes for Structural Roles in Orthopoxviruses and Novel Folds of Virus Proteins Revealed by Protein Structure Modeling. mBio 2023; 14:e0040823. [PMID: 37017580 PMCID: PMC10128050 DOI: 10.1128/mbio.00408-23] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 02/21/2023] [Indexed: 04/06/2023] Open
Abstract
Viruses with large, double-stranded DNA genomes captured the majority of their genes from their hosts at different stages of evolution. The origins of many virus genes are readily detected through significant sequence similarity with cellular homologs. In particular, this is the case for virus enzymes, such as DNA and RNA polymerases or nucleotide kinases, that retain their catalytic activity after capture by an ancestral virus. However, a large fraction of virus genes have no readily detectable cellular homologs, meaning that their origins remain enigmatic. We explored the potential origins of such proteins that are encoded in the genomes of orthopoxviruses, a thoroughly studied virus genus that includes major human pathogens. To this end, we used AlphaFold2 to predict the structures of all 214 proteins that are encoded by orthopoxviruses. Among the proteins of unknown provenance, structure prediction yielded clear indications of origin for 14 of them and validated several inferences that were previously made via sequence analysis. A notable emerging trend is the exaptation of enzymes from cellular organisms for nonenzymatic, structural roles in virus reproduction that is accompanied by the disruption of catalytic sites and by an overall drastic divergence that precludes homology detection at the sequence level. Among the 16 orthopoxvirus proteins that were found to be inactivated enzyme derivatives are the poxvirus replication processivity factor A20, which is an inactivated NAD-dependent DNA ligase; the major core protein A3, which is an inactivated deubiquitinase; F11, which is an inactivated prolyl hydroxylase; and more similar cases. For nearly one-third of the orthopoxvirus virion proteins, no significantly similar structures were identified, suggesting exaptation with subsequent major structural rearrangement that yielded unique protein folds. IMPORTANCE Protein structures are more strongly conserved in evolution than are amino acid sequences. Comparative structural analysis is particularly important for inferring the origins of viral proteins that typically evolve at high rates. We used a powerful protein structure modeling method, namely, AlphaFold2, to model the structures of all orthopoxvirus proteins and compared them to all available protein structures. Multiple cases of recruitment of host enzymes for structural roles in viruses, accompanied by the disruption of catalytic sites, were discovered. However, many viral proteins appear to have evolved unique structural folds.
Collapse
Affiliation(s)
- Pascal Mutz
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Wolfgang Resch
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland, USA
| | - Guilhem Faure
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Tatiana G. Senkevich
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Instutes of Health, Bethesda, Maryland, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Bernard Moss
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Instutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
27
|
Monti A, Vitagliano L, Caporale A, Ruvo M, Doti N. Targeting Protein-Protein Interfaces with Peptides: The Contribution of Chemical Combinatorial Peptide Library Approaches. Int J Mol Sci 2023; 24:7842. [PMID: 37175549 PMCID: PMC10178479 DOI: 10.3390/ijms24097842] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 04/22/2023] [Accepted: 04/23/2023] [Indexed: 05/15/2023] Open
Abstract
Protein-protein interfaces play fundamental roles in the molecular mechanisms underlying pathophysiological pathways and are important targets for the design of compounds of therapeutic interest. However, the identification of binding sites on protein surfaces and the development of modulators of protein-protein interactions still represent a major challenge due to their highly dynamic and extensive interfacial areas. Over the years, multiple strategies including structural, computational, and combinatorial approaches have been developed to characterize PPI and to date, several successful examples of small molecules, antibodies, peptides, and aptamers able to modulate these interfaces have been determined. Notably, peptides are a particularly useful tool for inhibiting PPIs due to their exquisite potency, specificity, and selectivity. Here, after an overview of PPIs and of the commonly used approaches to identify and characterize them, we describe and evaluate the impact of chemical peptide libraries in medicinal chemistry with a special focus on the results achieved through recent applications of this methodology. Finally, we also discuss the role that this methodology can have in the framework of the opportunities, and challenges that the application of new predictive approaches based on artificial intelligence is generating in structural biology.
Collapse
Affiliation(s)
- Alessandra Monti
- Institute of Biostructures and Bioimaging (IBB), National Research Council (CNR), 80131 Napoli, Italy; (A.M.); (L.V.); (M.R.)
| | - Luigi Vitagliano
- Institute of Biostructures and Bioimaging (IBB), National Research Council (CNR), 80131 Napoli, Italy; (A.M.); (L.V.); (M.R.)
| | - Andrea Caporale
- Institute of Crystallography (IC), National Research Council (CNR), Strada Statale 14 km 163.5, Basovizza, 34149 Triese, Italy;
| | - Menotti Ruvo
- Institute of Biostructures and Bioimaging (IBB), National Research Council (CNR), 80131 Napoli, Italy; (A.M.); (L.V.); (M.R.)
| | - Nunzianna Doti
- Institute of Biostructures and Bioimaging (IBB), National Research Council (CNR), 80131 Napoli, Italy; (A.M.); (L.V.); (M.R.)
| |
Collapse
|
28
|
Jussupow A, Kaila VRI. Effective Molecular Dynamics from Neural Network-Based Structure Prediction Models. J Chem Theory Comput 2023; 19:1965-1975. [PMID: 36961997 PMCID: PMC11181330 DOI: 10.1021/acs.jctc.2c01027] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Indexed: 03/26/2023]
Abstract
Recent breakthroughs in neural network-based structure prediction methods, such as AlphaFold2 and RoseTTAFold, have dramatically improved the quality of computational protein structure prediction. These models also provide statistical confidence scores that can estimate uncertainties in the predicted structures, but it remains unclear to what extent these scores are related to the intrinsic conformational dynamics of proteins. Here, we compare AlphaFold2 prediction scores with explicit large-scale molecular dynamics simulations of 28 one- and two-domain proteins with varying degrees of flexibility. We demonstrate a strong correlation between the statistical prediction scores and the explicit motion derived from extensive atomistic molecular dynamics simulations and further derive an elastic network model based on the statistical scores of AlphFold2 (AF-ENM), which we benchmark in combination with coarse-grained molecular dynamics simulations. We show that our AF-ENM method reproduces the global protein dynamics with improved accuracy, providing a powerful way to derive effective molecular dynamics using neural network-based structure prediction models.
Collapse
Affiliation(s)
- Alexander Jussupow
- Department of Biochemistry
and Biophysics, Stockholm University, 10691 Stockholm, Sweden
| | - Ville R. I. Kaila
- Department of Biochemistry
and Biophysics, Stockholm University, 10691 Stockholm, Sweden
| |
Collapse
|
29
|
Varadi M, Bordin N, Orengo C, Velankar S. The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors. Curr Opin Struct Biol 2023; 79:102543. [PMID: 36807079 DOI: 10.1016/j.sbi.2023.102543] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/04/2023] [Accepted: 01/13/2023] [Indexed: 02/21/2023]
Abstract
The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK. https://twitter.com/nicolabordin
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
30
|
Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023; 48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]
Abstract
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Christian Dallago
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; VantAI, 151 W 42nd Street, New York, NY 10036, USA
| | - Michael Heinzinger
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Stephanie Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Maria Littmann
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany; TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK.
| |
Collapse
|
31
|
Willems A, Kalaw A, Ecer A, Kotwal A, Roepe LD, Roepe PD. Structures of Plasmodium falciparum Chloroquine Resistance Transporter (PfCRT) Isoforms and Their Interactions with Chloroquine. Biochemistry 2023; 62:1093-1110. [PMID: 36800498 PMCID: PMC10950298 DOI: 10.1021/acs.biochem.2c00669] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 02/02/2023] [Indexed: 02/19/2023]
Abstract
Using a recently elucidated atomic-resolution cryogenic electron microscopy (cryo-EM) structure for the Plasmodium falciparum chloroquine resistance transporter (PfCRT) protein 7G8 isoform as template [Kim, J.; Nature 2019, 576, 315-320], we use Monte Carlo molecular dynamics (MC/MD) simulations of PfCRT embedded in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) membrane to solve energy-minimized structures for 7G8 PfCRT and two additional PfCRT isoforms that harbor 5 or 7 amino acid substitutions relative to 7G8 PfCRT. Guided by drug binding previously defined using chloroquine (CQ) photoaffinity probe labeling, we also use MC/MD energy minimization to elucidate likely CQ binding geometries for the three membrane-embedded isoforms. We inventory salt bridges and hydrogen bonds in these structures and summarize how the limited changes in primary sequence subtly perturb local PfCRT isoform structure. In addition, we use the "AlphaFold" artificial intelligence AlphaFold2 (AF2) algorithm to solve for domain structure that was not resolved in the previously reported 7G8 PfCRT cryo-EM structure, and perform MC/MD energy minimization for the membrane-embedded AF2 structures of all three PfCRT isoforms. We compare energy-minimized structures generated using cryo-EM vs AF2 templates. The results suggest how amino acid substitutions in drug resistance-associated isoforms of PfCRT influence PfCRT structure and CQ transport.
Collapse
Affiliation(s)
| | | | - Ayse Ecer
- Departments of Chemistry
and Biochemistry and Cellular and Molecular Biology, Georgetown University, 37th and O Streets NW, Washington, District of Columbia 20057, United States
| | - Amitesh Kotwal
- Departments of Chemistry
and Biochemistry and Cellular and Molecular Biology, Georgetown University, 37th and O Streets NW, Washington, District of Columbia 20057, United States
| | | | - Paul D. Roepe
- Departments of Chemistry
and Biochemistry and Cellular and Molecular Biology, Georgetown University, 37th and O Streets NW, Washington, District of Columbia 20057, United States
| |
Collapse
|
32
|
Wang H, Guo M, Wei H, Chen Y. Targeting p53 pathways: mechanisms, structures, and advances in therapy. Signal Transduct Target Ther 2023; 8:92. [PMID: 36859359 PMCID: PMC9977964 DOI: 10.1038/s41392-023-01347-1] [Citation(s) in RCA: 338] [Impact Index Per Article: 169.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 12/19/2022] [Accepted: 02/07/2023] [Indexed: 03/03/2023] Open
Abstract
The TP53 tumor suppressor is the most frequently altered gene in human cancers, and has been a major focus of oncology research. The p53 protein is a transcription factor that can activate the expression of multiple target genes and plays critical roles in regulating cell cycle, apoptosis, and genomic stability, and is widely regarded as the "guardian of the genome". Accumulating evidence has shown that p53 also regulates cell metabolism, ferroptosis, tumor microenvironment, autophagy and so on, all of which contribute to tumor suppression. Mutations in TP53 not only impair its tumor suppressor function, but also confer oncogenic properties to p53 mutants. Since p53 is mutated and inactivated in most malignant tumors, it has been a very attractive target for developing new anti-cancer drugs. However, until recently, p53 was considered an "undruggable" target and little progress has been made with p53-targeted therapies. Here, we provide a systematic review of the diverse molecular mechanisms of the p53 signaling pathway and how TP53 mutations impact tumor progression. We also discuss key structural features of the p53 protein and its inactivation by oncogenic mutations. In addition, we review the efforts that have been made in p53-targeted therapies, and discuss the challenges that have been encountered in clinical development.
Collapse
Affiliation(s)
- Haolan Wang
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Laboratory of Structural Biology, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Ming Guo
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Laboratory of Structural Biology, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China
| | - Hudie Wei
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Laboratory of Structural Biology, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| | - Yongheng Chen
- Department of Oncology, NHC Key Laboratory of Cancer Proteomics, Laboratory of Structural Biology, National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, 410008, China.
| |
Collapse
|
33
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|
34
|
Ponlachantra K, Suginta W, Robinson RC, Kitaoku Y. AlphaFold2: A versatile tool to predict the appearance of functional adaptations in evolution: Profilin interactions in uncultured Asgard archaea: Profilin interactions in uncultured Asgard archaea. Bioessays 2023; 45:e2200119. [PMID: 36461738 DOI: 10.1002/bies.202200119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/07/2022] [Accepted: 11/09/2022] [Indexed: 12/05/2022]
Abstract
The release of AlphaFold2 (AF2), a deep-learning-aided, open-source protein structure prediction program, from DeepMind, opened a new era of molecular biology. The astonishing improvement in the accuracy of the structure predictions provides the opportunity to characterize protein systems from uncultured Asgard archaea, key organisms in evolutionary biology. Despite the accumulation in metagenomics-derived Asgard archaea eukaryotic-like protein sequences, limited structural and biochemical information have restricted the insight in their potential functions. In this review, we focus on profilin, an actin-dynamics regulating protein, which in eukaryotes, modulates actin polymerization through (1) direct actin interaction, (2) polyproline binding, and (3) phospholipid binding. We assess AF2-predicted profilin structures in their potential abilities to participate in these activities. We demonstrate that AF2 is a powerful new tool for understanding the emergence of biological functional traits in evolution.
Collapse
Affiliation(s)
- Khongpon Ponlachantra
- School of Biomolecular Science and Engineering (BSE), Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Wipa Suginta
- School of Biomolecular Science and Engineering (BSE), Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Robert C Robinson
- School of Biomolecular Science and Engineering (BSE), Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand.,Research Institute for Interdisciplinary Science (RIIS), Okayama University, Okayama, Japan
| | - Yoshihito Kitaoku
- Research Institute for Interdisciplinary Science (RIIS), Okayama University, Okayama, Japan
| |
Collapse
|
35
|
Nicoli A, Haag F, Marcinek P, He R, Kreißl J, Stein J, Marchetto A, Dunkel A, Hofmann T, Krautwurst D, Di Pizio A. Modeling the Orthosteric Binding Site of the G Protein-Coupled Odorant Receptor OR5K1. J Chem Inf Model 2023; 63:2014-2029. [PMID: 36696962 PMCID: PMC10091413 DOI: 10.1021/acs.jcim.2c00752] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
With approximately 400 encoding genes in humans, odorant receptors (ORs) are the largest subfamily of class A G protein-coupled receptors (GPCRs). Despite its high relevance and representation, the odorant-GPCRome is structurally poorly characterized: no experimental structures are available, and the low sequence identity of ORs to experimentally solved GPCRs is a significant challenge for their modeling. Moreover, the receptive range of most ORs is unknown. The odorant receptor OR5K1 was recently and comprehensively characterized in terms of cognate agonists. Here, we report two additional agonists and functional data of the most potent compound on two mutants, L1043.32 and L2556.51. Experimental data was used to guide the investigation of the binding modes of OR5K1 ligands into the orthosteric binding site using structural information from AI-driven modeling, as recently released in the AlphaFold Protein Structure Database, and from homology modeling. Induced-fit docking simulations were used to sample the binding site conformational space for ensemble docking. Mutagenesis data guided side chain residue sampling and model selection. We obtained models that could better rationalize the different activity of active (agonist) versus inactive molecules with respect to starting models and also capture differences in activity related to minor structural differences. Therefore, we provide a model refinement protocol that can be applied to model the orthosteric binding site of ORs as well as that of GPCRs with low sequence identity to available templates.
Collapse
Affiliation(s)
- Alessandro Nicoli
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Franziska Haag
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Patrick Marcinek
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Ruiming He
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany.,Department of Chemistry, Technical University of Munich, 85748 Garching, Germany
| | - Johanna Kreißl
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Jörg Stein
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Alessandro Marchetto
- Computational Biomedicine, Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.,Department of Biology, Faculty of Mathematics, Computer Science and Natural Sciences, RWTH Aachen University, 52074 Aachen, Germany
| | - Andreas Dunkel
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Thomas Hofmann
- Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich, 85354 Freising, Germany
| | - Dietmar Krautwurst
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| | - Antonella Di Pizio
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
36
|
Cai T, Xie L, Zhang S, Chen M, He D, Badkul A, Liu Y, Namballa HK, Dorogan M, Harding WW, Mura C, Bourne PE, Xie L. End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins. PLoS Comput Biol 2023; 19:e1010851. [PMID: 36652496 PMCID: PMC9886305 DOI: 10.1371/journal.pcbi.1010851] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 01/30/2023] [Accepted: 01/05/2023] [Indexed: 01/19/2023] Open
Abstract
Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain "dark"-i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.
Collapse
Affiliation(s)
- Tian Cai
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Li Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Shuo Zhang
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Muge Chen
- Master Program in Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| | - Di He
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
| | - Amitesh Badkul
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Yang Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Hari Krishna Namballa
- Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America
| | - Michael Dorogan
- Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America
| | - Wayne W. Harding
- Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America
| | - Cameron Mura
- School of Data Science & Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Philip E. Bourne
- School of Data Science & Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Lei Xie
- Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, New York, United States of America
| |
Collapse
|
37
|
Stevens AO, Kazan IC, Ozkan B, He Y. Investigating the allosteric response of the PICK1 PDZ domain to different ligands with all-atom simulations. Protein Sci 2022; 31:e4474. [PMID: 36251217 PMCID: PMC9667829 DOI: 10.1002/pro.4474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/27/2022] [Accepted: 10/11/2022] [Indexed: 12/13/2022]
Abstract
The PDZ family is comprised of small modular domains that play critical roles in the allosteric modulation of many cellular signaling processes by binding to the C-terminal tail of different proteins. As dominant modular proteins that interact with a diverse set of peptides, it is of particular interest to explore how different binding partners induce different allosteric effects on the same PDZ domain. Because the PICK1 PDZ domain can bind different types of ligands, it is an ideal test case to answer this question and explore the network of interactions that give rise to dynamic allostery. Here, we use all-atom molecular dynamics simulations to explore dynamic allostery in the PICK1 PDZ domain by modeling two PICK1 PDZ systems: PICK1 PDZ-DAT and PICK1 PDZ-GluR2. Our results suggest that ligand binding to the PICK1 PDZ domain induces dynamic allostery at the αA helix that is similar to what has been observed in other PDZ domains. We found that the PICK1 PDZ-ligand distance is directly correlated with both dynamic changes of the αA helix and the distance between the αA helix and βB strand. Furthermore, our work identifies a hydrophobic core between DAT/GluR2 and I35 as a key interaction in inducing such dynamic allostery. Finally, the unique interaction patterns between different binding partners and the PICK1 PDZ domain can induce unique dynamic changes to the PICK1 PDZ domain. We suspect that unique allosteric coupling patterns with different ligands may play a critical role in how PICK1 performs its biological functions in various signaling networks.
Collapse
Affiliation(s)
- Amy O. Stevens
- Department of Chemistry and Chemical BiologyThe University of New MexicoAlbuquerqueNew MexicoUSA
| | - I. Can Kazan
- Department of Physics, Center for Biological PhysicsArizona State UniversityTempeArizonaUSA
| | - Banu Ozkan
- Department of Physics, Center for Biological PhysicsArizona State UniversityTempeArizonaUSA
| | - Yi He
- Department of Chemistry and Chemical BiologyThe University of New MexicoAlbuquerqueNew MexicoUSA
| |
Collapse
|
38
|
Using Alphafold2 to Predict the Structure of the Gp5/M Dimer of Porcine Respiratory and Reproductive Syndrome Virus. Int J Mol Sci 2022; 23:ijms232113209. [DOI: 10.3390/ijms232113209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/26/2022] [Accepted: 10/27/2022] [Indexed: 11/16/2022] Open
Abstract
Porcine reproductive and respiratory syndrome virus is a positive-stranded RNA virus of the family Arteriviridae. The Gp5/M dimer, the major component of the viral envelope, is required for virus budding and is an antibody target. We used alphafold2, an artificial-intelligence-based system, to predict a credible structure of Gp5/M. The short disulfide-linked ectodomains lie flat on the membrane, with the exception of the erected N-terminal helix of Gp5, which contains the antibody epitopes and a hypervariable region with a changing number of carbohydrates. The core of the dimer consists of six curved and tilted transmembrane helices, and three are from each protein. The third transmembrane regions extend into the cytoplasm as amphiphilic helices containing the acylation sites. The endodomains of Gp5 and M are composed of seven β-strands from each protein, which interact via β-strand seven. The area under the membrane forms an open cavity with a positive surface charge. The M and Orf3a proteins of coronaviruses have a similar structure, suggesting that all four proteins are derived from the same ancestral gene. Orf3a, like Gp5/M, is acylated at membrane-proximal cysteines. The role of Gp5/M during virus replication is discussed, in particular the mechanisms of virus budding and models of antibody-dependent virus neutralization.
Collapse
|
39
|
Bruley A, Mornon JP, Duprat E, Callebaut I. Digging into the 3D Structure Predictions of AlphaFold2 with Low Confidence: Disorder and Beyond. Biomolecules 2022; 12:1467. [PMID: 36291675 PMCID: PMC9599455 DOI: 10.3390/biom12101467] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/04/2022] [Accepted: 10/05/2022] [Indexed: 01/12/2023] Open
Abstract
AlphaFold2 (AF2) has created a breakthrough in biology by providing three-dimensional structure models for whole-proteome sequences, with unprecedented levels of accuracy. In addition, the AF2 pLDDT score, related to the model confidence, has been shown to provide a good measure of residue-wise disorder. Here, we combined AF2 predictions with pyHCA, a tool we previously developed to identify foldable segments and estimate their order/disorder ratio, from a single protein sequence. We focused our analysis on the AF2 predictions available for 21 reference proteomes (AFDB v1), in particular on their long foldable segments (>30 amino acids) that exhibit characteristics of soluble domains, as estimated by pyHCA. Among these segments, we provided a global analysis of those with very low pLDDT values along their entire length and compared their characteristics to those of segments with very high pLDDT values. We highlighted cases containing conditional order, as well as cases that could form well-folded structures but escape the AF2 prediction due to a shallow multiple sequence alignment and/or undocumented structure or fold. AF2 and pyHCA can therefore be advantageously combined to unravel cryptic structural features in whole proteomes and to refine predictions for different flavors of disorder.
Collapse
|
40
|
Varadi M, Anyango S, Appasamy SD, Armstrong D, Bage M, Berrisford J, Choudhary P, Bertoni D, Deshpande M, Leines GD, Ellaway J, Evans G, Gaborova R, Gupta D, Gutmanas A, Harrus D, Kleywegt GJ, Bueno WM, Nadzirin N, Nair S, Pravda L, Afonso MQL, Sehnal D, Tanweer A, Tolchard J, Abrams C, Dunlop R, Velankar S. PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education. Protein Sci 2022; 31:e4439. [PMID: 36173162 PMCID: PMC9517934 DOI: 10.1002/pro.4439] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/02/2022] [Accepted: 09/05/2022] [Indexed: 11/26/2022]
Abstract
The archiving and dissemination of protein and nucleic acid structures as well as their structural, functional and biophysical annotations is an essential task that enables the broader scientific community to conduct impactful research in multiple fields of the life sciences. The Protein Data Bank in Europe (PDBe; pdbe.org) team develops and maintains several databases and web services to address this fundamental need. From data archiving as a member of the Worldwide PDB consortium (wwPDB; wwpdb.org), to the PDBe Knowledge Base (PDBe-KB; pdbekb.org), we provide data, data-access mechanisms, and visualizations that facilitate basic and applied research and education across the life sciences. Here, we provide an overview of the structural data and annotations that we integrate and make freely available. We describe the web services and data visualization tools we offer, and provide information on how to effectively use or even further develop them. Finally, we discuss the direction of our data services, and how we aim to tackle new challenges that arise from the recent, unprecedented advances in the field of structure determination and protein structure modeling.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Stephen Anyango
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Sri Devan Appasamy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - David Armstrong
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Marcus Bage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - John Berrisford
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Preeti Choudhary
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Damian Bertoni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Mandar Deshpande
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Grisell Diaz Leines
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Joseph Ellaway
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Genevieve Evans
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Romana Gaborova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Deepti Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Aleksandras Gutmanas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Deborah Harrus
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | | | - Nurul Nadzirin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Sreenath Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Lukas Pravda
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | | | - David Sehnal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Ahsan Tanweer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - James Tolchard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Charlotte Abrams
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Roisin Dunlop
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton
| |
Collapse
|
41
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, Artificial Intelligence (AI), and Allostery. J Phys Chem B 2022; 126:6372-6383. [PMID: 35976160 PMCID: PMC9442638 DOI: 10.1021/acs.jpcb.2c04346] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/03/2022] [Indexed: 02/08/2023]
Abstract
AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer
Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
42
|
Stevens AO, He Y. Benchmarking the Accuracy of AlphaFold 2 in Loop Structure Prediction. Biomolecules 2022; 12:985. [PMID: 35883541 PMCID: PMC9312937 DOI: 10.3390/biom12070985] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/05/2022] [Accepted: 07/12/2022] [Indexed: 01/22/2023] Open
Abstract
The inhibition of protein-protein interactions is a growing strategy in drug development. In addition to structured regions, many protein loop regions are involved in protein-protein interactions and thus have been identified as potential drug targets. To effectively target such regions, protein structure is critical. Loop structure prediction is a challenging subgroup in the field of protein structure prediction because of the reduced level of conservation in protein sequences compared to the secondary structure elements. AlphaFold 2 has been suggested to be one of the greatest achievements in the field of protein structure prediction. The AlphaFold 2 predicted protein structures near the X-ray resolution in the Critical Assessment of protein Structure Prediction (CASP 14) competition in 2020. The purpose of this work is to survey the performance of AlphaFold 2 in specifically predicting protein loop regions. We have constructed an independent dataset of 31,650 loop regions from 2613 proteins (deposited after the AlphaFold 2 was trained) with both experimentally determined structures and AlphaFold 2 predicted structures. With extensive evaluation using our dataset, the results indicate that AlphaFold 2 is a good predictor of the structure of loop regions, especially for short loop regions. Loops less than 10 residues in length have an average Root Mean Square Deviation (RMSD) of 0.33 Å and an average the Template Modeling score (TM-score) of 0.82. However, we see that as the number of residues in a given loop increases, the accuracy of AlphaFold 2's prediction decreases. Loops more than 20 residues in length have an average RMSD of 2.04 Å and an average TM-score of 0.55. Such a correlation between accuracy and length of the loop is directly linked to the increase in flexibility. Moreover, AlphaFold 2 does slightly over-predict α-helices and β-strands in proteins.
Collapse
Affiliation(s)
- Amy O. Stevens
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA;
| | - Yi He
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA;
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|