1
|
Zhao B, Basu S, Kurgan L. DescribePROT Database of Residue-Level Protein Structure and Function Annotations. Methods Mol Biol 2025; 2867:169-184. [PMID: 39576581 DOI: 10.1007/978-1-0716-4196-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
DescribePROT is a freely available online database of structural and functional descriptors of proteins at the amino acid level. It provides access to 13 diverse descriptors that include sequence conservation, putative secondary structure, solvent accessibility, intrinsic disorder, and signal peptides, and putative annotations of residues that interact with proteins, peptides and nucleic acids. These data can be used to elucidate protein functions, to support efforts to develop therapeutics, and to develop and evaluate future predictors of protein structure and function. DescribePROT includes 7.8 billion predictions for 1.4 million proteins from 83 complete proteomes of popular model organisms. This information can be downloaded at multiple levels of scope (entire database, specific organisms, and individual proteins) and can be interacted with using a graphical interface that simultaneously displays data on multiple descriptors. We describe the contents of this resource, provide directions on how to use its interface, and offer instructions on how to obtain and interact with the underlying data. Moreover, we briefly discuss plans for a future expansion of this database. DescribePROT is available at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/ .
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Gao Q, Ge L, Wang Y, Zhu Y, Liu Y, Zhang H, Huang J, Qin Z. An explainable few-shot learning model for the directed evolution of antimicrobial peptides. Int J Biol Macromol 2024; 285:138272. [PMID: 39631577 DOI: 10.1016/j.ijbiomac.2024.138272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/20/2024] [Accepted: 11/30/2024] [Indexed: 12/07/2024]
Abstract
Due to the persistent threat of antibiotic resistance posed by Gram-negative pathogens, the discovery of new antimicrobial agents is of critical importance. In this study, we employed deep learning-guided directed evolution to explore the chemical space of antimicrobial peptides (AMPs), which present promising alternatives to traditional small-molecule antibiotics. Utilizing a fine-tuned protein language model tailored for small dataset learning, we achieved structural modifications of the lipopolysaccharide-binding domain (LBD) derived from Marsupenaeus japonicus, a prawn species of considerable value in aquaculture and commercial fisheries. The engineered LBDs demonstrated exceptional activity against a range of Gram-negative pathogens. Drawing inspiration from evolutionary principles, we elucidated the bactericidal mechanism through molecular dynamics simulations and mapped the directed evolution pathways using a ladderpath framework. This work highlights the efficacy of explainable few-shot learning in the rational design of AMPs through directed evolution.
Collapse
Affiliation(s)
- Qiandi Gao
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China
| | - Liangjun Ge
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China
| | - Yihan Wang
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China
| | - Yanran Zhu
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China
| | - Yu Liu
- International Academic Center of Complex Systems, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China
| | - Heqian Zhang
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China.
| | - Jiaquan Huang
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China.
| | - Zhiwei Qin
- Center for Biological Science and Technology, Advanced Institute of Natural Sciences, Beijing Normal University, Zhuhai, Guangdong 519087, China.
| |
Collapse
|
3
|
Keegan RM, Simpkin AJ, Rigden DJ. The success rate of processed predicted models in molecular replacement: implications for experimental phasing in the AlphaFold era. Acta Crystallogr D Struct Biol 2024; 80:766-779. [PMID: 39360967 PMCID: PMC11544426 DOI: 10.1107/s2059798324009380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 09/23/2024] [Indexed: 11/09/2024] Open
Abstract
The availability of highly accurate protein structure predictions from AlphaFold2 (AF2) and similar tools has hugely expanded the applicability of molecular replacement (MR) for crystal structure solution. Many structures can be solved routinely using raw models, structures processed to remove unreliable parts or models split into distinct structural units. There is therefore an open question around how many and which cases still require experimental phasing methods such as single-wavelength anomalous diffraction (SAD). Here, this question is addressed using a large set of PDB depositions that were solved by SAD. A large majority (87%) could be solved using unedited or minimally edited AF2 predictions. A further 18 (4%) yield straightforwardly to MR after splitting of the AF2 prediction using Slice'N'Dice, although different splitting methods succeeded on slightly different sets of cases. It is also found that further unique targets can be solved by alternative modelling approaches such as ESMFold (four cases), alternative MR approaches such as ARCIMBOLDO and AMPLE (two cases each), and multimeric model building with AlphaFold-Multimer or UniFold (three cases). Ultimately, only 12 cases, or 3% of the SAD-phased set, did not yield to any form of MR tested here, offering valuable hints as to the number and the characteristics of cases where experimental phasing remains essential for macromolecular structure solution.
Collapse
Affiliation(s)
- Ronan M. Keegan
- Institute of Systems, Molecular and Integrative BiologyUniversity of LiverpoolLiverpoolL69 7ZBUnited Kingdom
- UKRI–STFCRutherford Appleton LaboratoryResearch Complex at HarwellDidcotOX11 0FAUnited Kingdom
| | - Adam J. Simpkin
- Institute of Systems, Molecular and Integrative BiologyUniversity of LiverpoolLiverpoolL69 7ZBUnited Kingdom
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative BiologyUniversity of LiverpoolLiverpoolL69 7ZBUnited Kingdom
| |
Collapse
|
4
|
Lategan FA, Schreiber C, Patterton HG. SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures. BMC Bioinformatics 2023; 24:373. [PMID: 37789284 PMCID: PMC10546711 DOI: 10.1186/s12859-023-05498-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/25/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND The relationship between the sequence of a protein, its structure, and the resulting connection between its structure and function, is a foundational principle in biological science. Only recently has the computational prediction of protein structure based only on protein sequence been addressed effectively by AlphaFold, a neural network approach that can predict the majority of protein structures with X-ray crystallographic accuracy. A question that is now of acute relevance is the "inverse protein folding problem": predicting the sequence of a protein that folds into a specified structure. This will be of immense value in protein engineering and biotechnology, and will allow the design and expression of recombinant proteins that can, for instance, fold into specified structures as a scaffold for the attachment of recombinant antigens, or enzymes with modified or novel catalytic activities. Here we describe the development of SeqPredNN, a feed-forward neural network trained with X-ray crystallographic structures from the RCSB Protein Data Bank to predict the identity of amino acids in a protein structure using only the relative positions, orientations, and backbone dihedral angles of nearby residues. RESULTS We predict the sequence of a protein expected to fold into a specified structure and assess the accuracy of the prediction using both AlphaFold and RoseTTAFold to computationally generate the fold of the derived sequence. We show that the sequences predicted by SeqPredNN fold into a structure with a median TM-score of 0.638 when compared to the crystal structure according to AlphaFold predictions, yet these sequences are unique and only 28.4% identical to the sequence of the crystallized protein. CONCLUSIONS We propose that SeqPredNN will be a valuable tool to generate proteins of defined structure for the design of novel biomaterials, pharmaceuticals, catalysts, and reporter systems. The low sequence identity of its predictions compared to the native sequence could prove useful for developing proteins with modified physical properties, such as water solubility and thermal stability. The speed and ease of use of SeqPredNN offers a significant advantage over physics-based protein design methods.
Collapse
Affiliation(s)
- F Adriaan Lategan
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Caroline Schreiber
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Hugh G Patterton
- Center for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, 7600, South Africa.
| |
Collapse
|
5
|
Wang H, Zang Y, Kang Y, Zhang J, Zhang L, Zhang S. ETLD: an encoder-transformation layer-decoder architecture for protein contact and mutation effects prediction. Brief Bioinform 2023; 24:bbad290. [PMID: 37598423 DOI: 10.1093/bib/bbad290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/21/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open
Abstract
The latent features extracted from the multiple sequence alignments (MSAs) of homologous protein families are useful for identifying residue-residue contacts, predicting mutation effects, shaping protein evolution, etc. Over the past three decades, a growing body of supervised and unsupervised machine learning methods have been applied to this field, yielding fruitful results. Here, we propose a novel self-supervised model, called encoder-transformation layer-decoder (ETLD) architecture, capable of capturing protein sequence latent features directly from MSAs. Compared to the typical autoencoder model, ETLD introduces a transformation layer with the ability to learn inter-site couplings, which can be used to parse out the two-dimensional residue-residue contacts map after a simple mathematical derivation or an additional supervised neural network. ETLD retains the process of encoding and decoding sequences, and the predicted probabilities of amino acids at each site can be further used to construct the mutation landscapes for mutation effects prediction, outperforming advanced models such as GEMME, DeepSequence and EVmutation in general. Overall, ETLD is a highly interpretable unsupervised model with great potential for improvement and can be further combined with supervised methods for more extensive and accurate predictions.
Collapse
Affiliation(s)
- He Wang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Yongjian Zang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Ying Kang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jianwen Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Lei Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Shengli Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
6
|
Cicconardi F, Milanetti E, Pinheiro de Castro EC, Mazo-Vargas A, Van Belleghem SM, Ruggieri AA, Rastas P, Hanly J, Evans E, Jiggins CD, Owen McMillan W, Papa R, Di Marino D, Martin A, Montgomery SH. Evolutionary dynamics of genome size and content during the adaptive radiation of Heliconiini butterflies. Nat Commun 2023; 14:5620. [PMID: 37699868 PMCID: PMC10497600 DOI: 10.1038/s41467-023-41412-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 08/30/2023] [Indexed: 09/14/2023] Open
Abstract
Heliconius butterflies, a speciose genus of Müllerian mimics, represent a classic example of an adaptive radiation that includes a range of derived dietary, life history, physiological and neural traits. However, key lineages within the genus, and across the broader Heliconiini tribe, lack genomic resources, limiting our understanding of how adaptive and neutral processes shaped genome evolution during their radiation. Here, we generate highly contiguous genome assemblies for nine Heliconiini, 29 additional reference-assembled genomes, and improve 10 existing assemblies. Altogether, we provide a dataset of annotated genomes for a total of 63 species, including 58 species within the Heliconiini tribe. We use this extensive dataset to generate a robust and dated heliconiine phylogeny, describe major patterns of introgression, explore the evolution of genome architecture, and the genomic basis of key innovations in this enigmatic group, including an assessment of the evolution of putative regulatory regions at the Heliconius stem. Our work illustrates how the increased resolution provided by such dense genomic sampling improves our power to generate and test gene-phenotype hypotheses, and precisely characterize how genomes evolve.
Collapse
Affiliation(s)
- Francesco Cicconardi
- School of Biological Sciences, Bristol University, Bristol, United Kingdom.
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom.
| | - Edoardo Milanetti
- Department of Physics, Sapienza University, Piazzale Aldo Moro 5, 00185, Rome, Italy
- Center for Life Nano- & Neuro-Science, Italian Institute of Technology, Viale Regina Elena 291, 00161, Rome, Italy
| | | | - Anyi Mazo-Vargas
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Steven M Van Belleghem
- Department of Biology, University of Puerto Rico, Rio Piedras, PR, Puerto Rico
- Ecology, Evolution and Conservation Biology, Biology Department, KU Leuven, Leuven, Belgium
| | | | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Joseph Hanly
- Department of Biological Sciences, The George Washington University, Washington DC, WA, 20052, USA
- Smithsonian Tropical Research Institute, Panama City, Panama
| | - Elizabeth Evans
- Department of Biology, University of Puerto Rico, Rio Piedras, PR, Puerto Rico
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - W Owen McMillan
- Smithsonian Tropical Research Institute, Panama City, Panama
| | - Riccardo Papa
- Department of Biology, University of Puerto Rico, Rio Piedras, PR, Puerto Rico
- Molecular Sciences and Research Center, University of Puerto Rico, San Juan, PR, Puerto Rico
- Comprehensive Cancer Center, University of Puerto Rico, San Juan, PR, Puerto Rico
| | - Daniele Di Marino
- Department of Life and Environmental Sciences, New York-Marche Structural Biology Center (NY-MaSBiC), Polytechnic University of Marche, Via Brecce Bianche, 60131, Ancona, Italy
- Neuronal Death and Neuroprotection Unit, Department of Neuroscience, Mario Negri Institute for Pharmacological Research-IRCCS, Via Mario Negri 2, 20156, Milano, Italy
- National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Arnaud Martin
- Department of Biological Sciences, The George Washington University, Washington DC, WA, 20052, USA
| | - Stephen H Montgomery
- School of Biological Sciences, Bristol University, Bristol, United Kingdom.
- Smithsonian Tropical Research Institute, Panama City, Panama.
| |
Collapse
|
7
|
Sun J, Kulandaisamy A, Liu J, Hu K, Gromiha MM, Zhang Y. Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications. Comput Struct Biotechnol J 2023; 21:1205-1226. [PMID: 36817959 PMCID: PMC9932300 DOI: 10.1016/j.csbj.2023.01.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 01/16/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Membrane proteins mediate a wide spectrum of biological processes, such as signal transduction and cell communication. Due to the arduous and costly nature inherent to the experimental process, membrane proteins have long been devoid of well-resolved atomic-level tertiary structures and, consequently, the understanding of their functional roles underlying a multitude of life activities has been hampered. Currently, computational tools dedicated to furthering the structure-function understanding are primarily focused on utilizing intelligent algorithms to address a variety of site-wise prediction problems (e.g., topology and interaction sites), but are scattered across different computing sources. Moreover, the recent advent of deep learning techniques has immensely expedited the development of computational tools for membrane protein-related prediction problems. Given the growing number of applications optimized particularly by manifold deep neural networks, we herein provide a review on the current status of computational strategies mainly in membrane protein type classification, topology identification, interaction site detection, and pathogenic effect prediction. Meanwhile, we provide an overview of how the entire prediction process proceeds, including database collection, data pre-processing, feature extraction, and method selection. This review is expected to be useful for developing more extendable computational tools specific to membrane proteins.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Headington, Oxford OX3 7LD, UK
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jacklyn Liu
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Kai Hu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India,Corresponding authors.
| | - Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China,Corresponding authors.
| |
Collapse
|
8
|
Sánchez Rodríguez F, Chojnowski G, Keegan RM, Rigden DJ. Using deep-learning predictions of inter-residue distances for model validation. Acta Crystallogr D Struct Biol 2022; 78:1412-1427. [PMID: 36458613 PMCID: PMC9716559 DOI: 10.1107/s2059798322010415] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Determination of protein structures typically entails building a model that satisfies the collected experimental observations and its deposition in the Protein Data Bank. Experimental limitations can lead to unavoidable uncertainties during the process of model building, which result in the introduction of errors into the deposited model. Many metrics are available for model validation, but most are limited to consideration of the physico-chemical aspects of the model or its match to the experimental data. The latest advances in the field of deep learning have enabled the increasingly accurate prediction of inter-residue distances, an advance which has played a pivotal role in the recent improvements observed in the field of protein ab initio modelling. Here, new validation methods are presented based on the use of these precise inter-residue distance predictions, which are compared with the distances observed in the protein model. Sequence-register errors are particularly clearly detected and the register shifts required for their correction can be reliably determined. The method is available in the ConKit package (https://www.conkit.org).
Collapse
Affiliation(s)
- Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
- Life Science, Diamond Light Source, Harwell Science and Innovation Campus, Didcot OX11 0DE, United Kingdom
| | - Grzegorz Chojnowski
- European Molecular Biology Laboratory, Hamburg Unit, Notkestrasse 85, 22607 Hamburg, Germany
| | - Ronan M. Keegan
- UKRI–STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, United Kingdom
| | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| |
Collapse
|
9
|
Roche R, Bhattacharya S, Shuvo MH, Bhattacharya D. rrQNet: Protein contact map quality estimation by deep evolutionary reconciliation. Proteins 2022; 90:2023-2034. [PMID: 35751651 PMCID: PMC9633355 DOI: 10.1002/prot.26394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/31/2022] [Accepted: 06/21/2022] [Indexed: 11/10/2022]
Abstract
Protein contact maps have proven to be a valuable tool in the deep learning revolution of protein structure prediction, ushering in the recent breakthrough by AlphaFold2. However, self-assessment of the quality of predicted structures are typically performed at the granularity of three-dimensional coordinates as opposed to directly exploiting the rotation- and translation-invariant two-dimensional (2D) contact maps. Here, we present rrQNet, a deep learning method for self-assessment in 2D by contact map quality estimation. Our approach is based on the intuition that for a contact map to be of high quality, the residue pairs predicted to be in contact should be mutually consistent with the evolutionary context of the protein. The deep neural network architecture of rrQNet implements this intuition by cascading two deep modules-one encoding the evolutionary context and the other performing evolutionary reconciliation. The penultimate stage of rrQNet estimates the quality scores at the interacting residue-pair level, which are then aggregated for estimating the quality of a contact map. This design choice offers versatility at varied resolutions from individual residue pairs to full-fledged contact maps. Trained on multiple complementary sources of contact predictors, rrQNet facilitates generalizability across various contact maps. By rigorously testing using publicly available datasets and comparing against several in-house baseline approaches, we show that rrQNet accurately reproduces the true quality score of a predicted contact map and successfully distinguishes between accurate and inaccurate contact maps predicted by a wide variety of contact predictors. The open-source rrQNet software package is freely available at https://github.com/Bhattacharya-Lab/rrQNet.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
10
|
rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J Mol Biol 2022. [DOI: 10.1016/j.jmb.2022.167904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
11
|
Mosalaganti S, Obarska-Kosinska A, Siggel M, Taniguchi R, Turoňová B, Zimmerli CE, Buczak K, Schmidt FH, Margiotta E, Mackmull MT, Hagen WJH, Hummer G, Kosinski J, Beck M. AI-based structure prediction empowers integrative structural analysis of human nuclear pores. Science 2022; 376:eabm9506. [PMID: 35679397 DOI: 10.1126/science.abm9506] [Citation(s) in RCA: 176] [Impact Index Per Article: 58.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION The eukaryotic nucleus pro-tects the genome and is enclosed by the two membranes of the nuclear envelope. Nuclear pore complexes (NPCs) perforate the nuclear envelope to facilitate nucleocytoplasmic transport. With a molecular weight of ∼120 MDa, the human NPC is one of the larg-est protein complexes. Its ~1000 proteins are taken in multiple copies from a set of about 30 distinct nucleoporins (NUPs). They can be roughly categorized into two classes. Scaf-fold NUPs contain folded domains and form a cylindrical scaffold architecture around a central channel. Intrinsically disordered NUPs line the scaffold and extend into the central channel, where they interact with cargo complexes. The NPC architecture is highly dynamic. It responds to changes in nuclear envelope tension with conforma-tional breathing that manifests in dilation and constriction movements. Elucidating the scaffold architecture, ultimately at atomic resolution, will be important for gaining a more precise understanding of NPC function and dynamics but imposes a substantial chal-lenge for structural biologists. RATIONALE Considerable progress has been made toward this goal by a joint effort in the field. A synergistic combination of complementary approaches has turned out to be critical. In situ structural biology techniques were used to reveal the overall layout of the NPC scaffold that defines the spatial reference for molecular modeling. High-resolution structures of many NUPs were determined in vitro. Proteomic analysis and extensive biochemical work unraveled the interaction network of NUPs. Integra-tive modeling has been used to combine the different types of data, resulting in a rough outline of the NPC scaffold. Previous struc-tural models of the human NPC, however, were patchy and limited in accuracy owing to several challenges: (i) Many of the high-resolution structures of individual NUPs have been solved from distantly related species and, consequently, do not comprehensively cover their human counterparts. (ii) The scaf-fold is interconnected by a set of intrinsically disordered linker NUPs that are not straight-forwardly accessible to common structural biology techniques. (iii) The NPC scaffold intimately embraces the fused inner and outer nuclear membranes in a distinctive topol-ogy and cannot be studied in isolation. (iv) The conformational dynamics of scaffold NUPs limits the resolution achievable in structure determination. RESULTS In this study, we used artificial intelligence (AI)-based prediction to generate an exten-sive repertoire of structural models of human NUPs and their subcomplexes. The resulting models cover various domains and interfaces that so far remained structurally uncharac-terized. Benchmarking against previous and unpublished x-ray and cryo-electron micros-copy structures revealed unprecedented accu-racy. We obtained well-resolved cryo-electron tomographic maps of both the constricted and dilated conformational states of the hu-man NPC. Using integrative modeling, we fit-ted the structural models of individual NUPs into the cryo-electron microscopy maps. We explicitly included several linker NUPs and traced their trajectory through the NPC scaf-fold. We elucidated in great detail how mem-brane-associated and transmembrane NUPs are distributed across the fusion topology of both nuclear membranes. The resulting architectural model increases the structural coverage of the human NPC scaffold by about twofold. We extensively validated our model against both earlier and new experimental data. The completeness of our model has enabled microsecond-long coarse-grained molecular dynamics simulations of the NPC scaffold within an explicit membrane en-vironment and solvent. These simulations reveal that the NPC scaffold prevents the constriction of the otherwise stable double-membrane fusion pore to small diameters in the absence of membrane tension. CONCLUSION Our 70-MDa atomically re-solved model covers >90% of the human NPC scaffold. It captures conforma-tional changes that occur during dilation and constriction. It also reveals the precise anchoring sites for intrinsically disordered NUPs, the identification of which is a prerequisite for a complete and dy-namic model of the NPC. Our study exempli-fies how AI-based structure prediction may accelerate the elucidation of subcellular ar-chitecture at atomic resolution. [Figure: see text].
Collapse
Affiliation(s)
- Shyamal Mosalaganti
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Life Sciences Institute and Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Agnieszka Obarska-Kosinska
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,European Molecular Biology Laboratory Hamburg, 22607 Hamburg, Germany
| | - Marc Siggel
- European Molecular Biology Laboratory Hamburg, 22607 Hamburg, Germany.,Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Centre for Structural Systems Biology, 22607 Hamburg, Germany
| | - Reiya Taniguchi
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Beata Turoňová
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Christian E Zimmerli
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Katarzyna Buczak
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Florian H Schmidt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Erica Margiotta
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Marie-Therese Mackmull
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Wim J H Hagen
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Gerhard Hummer
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Institute of Biophysics, Goethe University Frankfurt, 60438 Frankfurt am Main, Germany
| | - Jan Kosinski
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,European Molecular Biology Laboratory Hamburg, 22607 Hamburg, Germany.,Centre for Structural Systems Biology, 22607 Hamburg, Germany
| | - Martin Beck
- Department of Molecular Sociology, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| |
Collapse
|
12
|
A database of calculated solution parameters for the AlphaFold predicted protein structures. Sci Rep 2022; 12:7349. [PMID: 35513443 PMCID: PMC9072687 DOI: 10.1038/s41598-022-10607-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/07/2022] [Indexed: 12/22/2022] Open
Abstract
Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${D_{t(20,w)}^{0}}$$\end{document}Dt(20,w)0, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${s_{{\left( {{20},w} \right)}}^{{0}} }$$\end{document}s20,w0) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${D_{t(20,w)}^{0}}$$\end{document}Dt(20,w)0, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${s_{{\left( {{20},w} \right)}}^{{0}} }$$\end{document}s20,w0, [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.
Collapse
|
13
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. Proteins 2021; 89:1607-1617. [PMID: 34533838 PMCID: PMC8726744 DOI: 10.1002/prot.26237] [Citation(s) in RCA: 273] [Impact Index Per Article: 68.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 07/28/2021] [Indexed: 01/14/2023]
Abstract
Critical assessment of structure prediction (CASP) is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14), deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein-folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here, we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Maya Topf
- Centre for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universit tsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, 9600 Gudelsky Drive, Rockville, MD 20850, USA, Department of Cell Biology and Molecular Genetics, University of Maryland
| |
Collapse
|