1
|
Single-particle Cryo-EM and molecular dynamics simulations: A perfect match. Curr Opin Struct Biol 2024; 86:102825. [PMID: 38723560 DOI: 10.1016/j.sbi.2024.102825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 05/19/2024]
Abstract
Knowledge of the structure and dynamics of biomolecules is key to understanding the mechanisms underlying their biological functions. Single-particle cryo-electron microscopy (cryo-EM) is a powerful structural biology technique to characterize complex biomolecular systems. Here, we review recent advances of how Molecular Dynamics (MD) simulations are being used to increase and enhance the information extracted from cryo-EM experiments. We will particularly focus on the physics underlying these experiments, how MD facilitates structure refinement, in particular for heterogeneous and non-isotropic resolution, and how thermodynamic and kinetic information can be extracted from cryo-EM data.
Collapse
|
2
|
Direct Imaging of Radiation-Sensitive Organic Polymer-Based Nanocrystals at Sub-Ångström Resolution. NANOMATERIALS (BASEL, SWITZERLAND) 2024; 14:872. [PMID: 38786829 DOI: 10.3390/nano14100872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]
Abstract
Seeing the atomic configuration of single organic nanoparticles at a sub-Å spatial resolution by transmission electron microscopy has been so far prevented by the high sensitivity of soft matter to radiation damage. This difficulty is related to the need to irradiate the particle with a total dose of a few electrons/Å2, not compatible with the electron beam density necessary to search the low-contrast nanoparticle, to control its drift, finely adjust the electron-optical conditions and particle orientation, and finally acquire an effective atomic-resolution image. On the other hand, the capability to study individual pristine nanoparticles, such as proteins, active pharmaceutical ingredients, and polymers, with peculiar sensitivity to the variation in the local structure, defects, and strain, would provide advancements in many fields, including materials science, medicine, biology, and pharmacology. Here, we report the direct sub-ångström-resolution imaging at room temperature of pristine unstained crystalline polymer-based nanoparticles. This result is obtained by combining low-dose in-line electron holography and phase-contrast imaging on state-of-the-art equipment, providing an effective tool for the quantitative sub-ångström imaging of soft matter.
Collapse
|
3
|
Uncovering Protein Ensembles: Automated Multiconformer Model Building for X-ray Crystallography and Cryo-EM. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.28.546963. [PMID: 37425870 PMCID: PMC10327213 DOI: 10.1101/2023.06.28.546963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
In their folded state, biomolecules exchange between multiple conformational states that are crucial for their function. Traditional structural biology methods, such as X-ray crystallography and cryogenic electron microscopy (cryo-EM), produce density maps that are ensemble averages, reflecting molecules in various conformations. Yet, most models derived from these maps explicitly represent only a single conformation, overlooking the complexity of biomolecular structures. To accurately reflect the diversity of biomolecular forms, there is a pressing need to shift towards modeling structural ensembles that mirror the experimental data. However, the challenge of distinguishing signal from noise complicates manual efforts to create these models. In response, we introduce the latest enhancements to qFit, an automated computational strategy designed to incorporate protein conformational heterogeneity into models built into density maps. These algorithmic improvements in qFit are substantiated by superior R f r e e and geometry metrics across a wide range of proteins. Importantly, unlike more complex multicopy ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot) and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster). By reducing the barrier of creating multiconformer models, qFit can foster the development of new hypotheses about the relationship between macromolecular conformational dynamics and function.
Collapse
|
4
|
Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024; 29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]
Abstract
Each of the ∼20,000 proteins in the human proteome is a potential target for compounds that bind to it and modify its function. The 3D structures of most of these proteins are now available. Here, we discuss the prospects for using these structures to perform proteome-wide virtual HTS (VHTS). We compare physics-based (docking) and AI VHTS approaches, some of which are now being applied with large databases of compounds to thousands of targets. Although preliminary proteome-wide screens are now within our grasp, further methodological developments are expected to improve the accuracy of the results.
Collapse
|
5
|
Community recommendations on cryoEM data archiving and validation. IUCRJ 2024; 11:140-151. [PMID: 38358351 PMCID: PMC10916293 DOI: 10.1107/s2052252524001246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/06/2024] [Indexed: 02/16/2024]
Abstract
In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for the deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshop's motivation and history, the topics discussed, and the resulting consensus recommendations. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.
Collapse
|
6
|
Predictive modeling and cryo-EM: A synergistic approach to modeling macromolecular structure. Biophys J 2024; 123:435-450. [PMID: 38268190 PMCID: PMC10912932 DOI: 10.1016/j.bpj.2024.01.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/09/2024] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Over the last 15 years, structural biology has seen unprecedented development and improvement in two areas: electron cryo-microscopy (cryo-EM) and predictive modeling. Once relegated to low resolutions, single-particle cryo-EM is now capable of achieving near-atomic resolutions of a wide variety of macromolecular complexes. Ushered in by AlphaFold, machine learning has powered the current generation of predictive modeling tools, which can accurately and reliably predict models for proteins and some complexes directly from the sequence alone. Although they offer new opportunities individually, there is an inherent synergy between these techniques, allowing for the construction of large, complex macromolecular models. Here, we give a brief overview of these approaches in addition to illustrating works that combine these techniques for model building. These examples provide insight into model building, assessment, and limitations when integrating predictive modeling with cryo-EM density maps. Together, these approaches offer the potential to greatly accelerate the generation of macromolecular structural insights, particularly when coupled with experimental data.
Collapse
|
7
|
Community recommendations on cryoEM data archiving and validation: Outcomes of a wwPDB/EMDB workshop on cryoEM data management, deposition and validation. ARXIV 2024:arXiv:2311.17640v3. [PMID: 38076521 PMCID: PMC10705588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshop's motivation and history, the topics discussed, and consensus recommendations resulting from the workshop. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.
Collapse
|
8
|
Outcomes of the EMDataResource Cryo-EM Ligand Modeling Challenge. RESEARCH SQUARE 2024:rs.3.rs-3864137. [PMID: 38343795 PMCID: PMC10854310 DOI: 10.21203/rs.3.rs-3864137/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2024]
Abstract
The EMDataResource Ligand Model Challenge aimed to assess the reliability and reproducibility of modeling ligands bound to protein and protein/nucleic-acid complexes in cryogenic electron microscopy (cryo-EM) maps determined at near-atomic (1.9-2.5 Å) resolution. Three published maps were selected as targets: E. coli beta-galactosidase with inhibitor, SARS-CoV-2 RNA-dependent RNA polymerase with covalently bound nucleotide analog, and SARS-CoV-2 ion channel ORF3a with bound lipid. Sixty-one models were submitted from 17 independent research groups, each with supporting workflow details. We found that (1) the quality of submitted ligand models and surrounding atoms varied, as judged by visual inspection and quantification of local map quality, model-to-map fit, geometry, energetics, and contact scores, and (2) a composite rather than a single score was needed to assess macromolecule+ligand model quality. These observations lead us to recommend best practices for assessing cryo-EM structures of liganded macromolecules reported at near-atomic resolution.
Collapse
|
9
|
Cryo-EM structure and B-factor refinement with ensemble representation. Nat Commun 2024; 15:444. [PMID: 38200043 PMCID: PMC10781738 DOI: 10.1038/s41467-023-44593-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 12/20/2023] [Indexed: 01/12/2024] Open
Abstract
Cryo-EM experiments produce images of macromolecular assemblies that are combined to produce three-dimensional density maps. Typically, atomic models of the constituent molecules are fitted into these maps, followed by a density-guided refinement. We introduce TEMPy-ReFF, a method for atomic structure refinement in cryo-EM density maps. Our method represents atomic positions as components of a Gaussian mixture model, utilising their variances as B-factors, which are used to derive an ensemble description. Extensively tested on a substantial dataset of 229 cryo-EM maps from EMDB ranging in resolution from 2.1-4.9 Å with corresponding PDB and CERES atomic models, our results demonstrate that TEMPy-ReFF ensembles provide a superior representation of cryo-EM maps. On a single-model basis, it performs similarly to the CERES re-refinement protocol, although there are cases where it provides a better fit to the map. Furthermore, our method enables the creation of composite maps free of boundary artefacts. TEMPy-ReFF is useful for better interpretation of flexible structures, such as those involving RNA, DNA or ligands.
Collapse
|
10
|
The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions. Acta Crystallogr D Struct Biol 2023; 79:1071-1078. [PMID: 37921807 PMCID: PMC10833350 DOI: 10.1107/s2059798323008847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/09/2023] [Indexed: 11/04/2023] Open
Abstract
Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5 Å for both crystallography and cryo-EM. However, at local resolutions worse than 2.5 Å both the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90-180°. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in high-confidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.
Collapse
|
11
|
Advancing cryo-electron microscopy data analysis through accelerated simulation-based flexible fitting approaches. Curr Opin Struct Biol 2023; 82:102653. [PMID: 37451233 DOI: 10.1016/j.sbi.2023.102653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/30/2023] [Accepted: 06/19/2023] [Indexed: 07/18/2023]
Abstract
Flexible fitting based on molecular dynamics simulation is a technique for structure modeling from cryo-EM data. It has been utilized for nearly two decades, and while cryo-EM resolution has improved significantly, it remains a powerful approach that can provide structural and dynamical insights that are not directly accessible from experimental data alone. Molecular dynamics simulations provide a means to extract atomistic details of conformational changes that are encoded in cryo-EM data and can also assist in improving the quality of structural models. Additionally, molecular dynamics simulations enable the characterization of conformational heterogeneity in cryo-EM data. We will summarize the advancements made in these techniques and highlight recent developments in this field.
Collapse
|
12
|
Adaptive Ensemble Refinement of Protein Structures in High Resolution Electron Microscopy Density Maps with Radical Augmented Molecular Dynamics Flexible Fitting. J Chem Inf Model 2023; 63:5834-5846. [PMID: 37661856 DOI: 10.1021/acs.jcim.3c00350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Recent advances in cryo-electron microscopy (cryo-EM) have enabled modeling macromolecular complexes that are essential components of the cellular machinery. The density maps derived from cryo-EM experiments are often integrated with manual, knowledge-driven or artificial intelligence-driven and physics-guided computational methods to build, fit, and refine molecular structures. Going beyond a single stationary-structure determination scheme, it is becoming more common to interpret the experimental data with an ensemble of models that contributes to an average observation. Hence, there is a need to decide on the quality of an ensemble of protein structures on-the-fly while refining them against the density maps. We introduce such an adaptive decision-making scheme during the molecular dynamics flexible fitting (MDFF) of biomolecules. Using RADICAL-Cybertools, the new RADICAL augmented MDFF implementation (R-MDFF) is examined in high-performance computing environments for refinement of two prototypical protein systems, adenylate kinase and carbon monoxide dehydrogenase. For these test cases, use of multiple replicas in flexible fitting with adaptive decision making in R-MDFF improves the overall correlation to the density by 40% relative to the refinements of the brute-force MDFF. The improvements are particularly significant at high, 2-3 Å map resolutions. More importantly, the ensemble model captures key features of biologically relevant molecular dynamics that are inaccessible to a single-model interpretation. Finally, the pipeline is applicable to systems of growing sizes, which is demonstrated using ensemble refinement of capsid proteins from the chimpanzee adenovirus. The overhead for decision making remains low and robust to computing environments. The software is publicly available on GitHub and includes a short user guide to install R-MDFF on different computing environments, from local Linux-based workstations to high-performance computing environments.
Collapse
|
13
|
DoubleHelix: nucleic acid sequence identification, assignment and validation tool for cryo-EM and crystal structure models. Nucleic Acids Res 2023; 51:8255-8269. [PMID: 37395405 PMCID: PMC10450167 DOI: 10.1093/nar/gkad553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/29/2023] [Accepted: 06/16/2023] [Indexed: 07/04/2023] Open
Abstract
Sequence assignment is a key step of the model building process in both cryogenic electron microscopy (cryo-EM) and macromolecular crystallography (MX). If the assignment fails, it can result in difficult to identify errors affecting the interpretation of a model. There are many model validation strategies that help experimentalists in this step of protein model building, but they are virtually non-existent for nucleic acids. Here, I present doubleHelix-a comprehensive method for assignment, identification, and validation of nucleic acid sequences in structures determined using cryo-EM and MX. The method combines a neural network classifier of nucleobase identities and a sequence-independent secondary structure assignment approach. I show that the presented method can successfully assist sequence-assignment step in nucleic-acid model building at lower resolutions, where visual map interpretation is very difficult. Moreover, I present examples of sequence assignment errors detected using doubleHelix in cryo-EM and MX structures of ribosomes deposited in the Protein Data Bank, which escaped the scrutiny of available model-validation approaches. The doubleHelix program source code is available under BSD-3 license at https://gitlab.com/gchojnowski/doublehelix.
Collapse
|
14
|
Diagnosing and treating issues in cryo-EM map-derived models. Structure 2023; 31:759-761. [PMID: 37419099 DOI: 10.1016/j.str.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/05/2023] [Accepted: 06/08/2023] [Indexed: 07/09/2023]
Abstract
In this issue of Structure, Reggiano et al.1 take the evaluation of cryo-EM models to the next level, combining several metrics into one. The new method, MEDIC, evaluates models at the residue level, helping to guide improvements and interpretation of models derived from cryo-EM maps.
Collapse
|
15
|
Residue-level error detection in cryoelectron microscopy models. Structure 2023; 31:860-869.e4. [PMID: 37253357 PMCID: PMC10330749 DOI: 10.1016/j.str.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/16/2023] [Accepted: 05/03/2023] [Indexed: 06/01/2023]
Abstract
Building accurate protein models into moderate resolution (3-5 Å) cryoelectron microscopy (cryo-EM) maps is challenging and error prone. We have developed MEDIC (Model Error Detection in Cryo-EM), a robust statistical model that identifies local backbone errors in protein structures built into cryo-EM maps by combining local fit-to-density with deep-learning-derived structural information. MEDIC is validated on a set of 28 structures that were subsequently solved to higher resolutions, where we identify the differences between low- and high-resolution structures with 68% precision and 60% recall. We additionally use this model to fix over 100 errors in 12 deposited structures and to identify errors in 4 refined AlphaFold predictions with 80% precision and 60% recall. As modelers more frequently use deep learning predictions as a starting point for refinement and rebuilding, MEDIC's ability to handle errors in structures derived from hand-building and machine learning methods makes it a powerful tool for structural biologists.
Collapse
|
16
|
Improvement of cryo-EM maps by simultaneous local and non-local deep learning. Nat Commun 2023; 14:3217. [PMID: 37270635 DOI: 10.1038/s41467-023-39031-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 05/25/2023] [Indexed: 06/05/2023] Open
Abstract
Cryo-EM has emerged as the most important technique for structure determination of macromolecular complexes. However, raw cryo-EM maps often exhibit loss of contrast at high resolution and heterogeneity over the entire map. As such, various post-processing methods have been proposed to improve cryo-EM maps. Nevertheless, it is still challenging to improve both the quality and interpretability of EM maps. Addressing the challenge, we present a three-dimensional Swin-Conv-UNet-based deep learning framework to improve cryo-EM maps, named EMReady, by not only implementing both local and non-local modeling modules in a multiscale UNet architecture but also simultaneously minimizing the local smooth L1 distance and maximizing the non-local structural similarity between processed experimental and simulated target maps in the loss function. EMReady was extensively evaluated on diverse test sets of 110 primary cryo-EM maps and 25 pairs of half-maps at 3.0-6.0 Å resolutions, and compared with five state-of-the-art map post-processing methods. It is shown that EMReady can not only robustly enhance the quality of cryo-EM maps in terms of map-model correlations, but also improve the interpretability of the maps in automatic de novo model building.
Collapse
|
17
|
DAQ-Score Database: assessment of map-model compatibility for protein structure models from cryo-EM maps. Nat Methods 2023; 20:775-776. [PMID: 37161061 PMCID: PMC10560587 DOI: 10.1038/s41592-023-01876-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
|
18
|
Stacked binding of a PET ligand to Alzheimer's tau paired helical filaments. Nat Commun 2023; 14:3048. [PMID: 37236970 PMCID: PMC10220082 DOI: 10.1038/s41467-023-38537-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
Accumulation of filamentous aggregates of tau protein in the brain is a pathological hallmark of Alzheimer's disease (AD) and many other neurodegenerative tauopathies. The filaments adopt disease-specific cross-β amyloid conformations that self-propagate and are implicated in neuronal loss. Development of molecular diagnostics and therapeutics is of critical importance. However, mechanisms of small molecule binding to the amyloid core is poorly understood. We used cryo-electron microscopy to determine a 2.7 Å structure of AD patient-derived tau paired-helical filaments bound to the PET ligand GTP-1. The compound is bound stoichiometrically at a single site along an exposed cleft of each protofilament in a stacked arrangement matching the fibril symmetry. Multiscale modeling reveals pi-pi aromatic interactions that pair favorably with the small molecule-protein contacts, supporting high specificity and affinity for the AD tau conformation. This binding mode offers critical insight into designing compounds to target different amyloid folds found across neurodegenerative diseases.
Collapse
|
19
|
Tertiary structure of single-instant RNA molecule reveals folding landscape. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541511. [PMID: 37292713 PMCID: PMC10245749 DOI: 10.1101/2023.05.19.541511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The folding of RNA and protein molecules during their synthesis is a crucial self-assembly process that nature employs to convert genetic information into the complex molecular machinery that supports life. Misfolding events are the cause of several diseases, and the folding pathway of central biomolecules, such as the ribosome, is strictly regulated by programmed maturation processes and folding chaperones. However, the dynamic folding processes are challenging to study because current structure determination methods heavily rely on averaging, and existing computational methods do not efficiently simulate non-equilibrium dynamics. Here we utilize individual-particle cryo-electron tomography (IPET) to investigate the folding landscape of a rationally designed RNA origami 6-helix bundle that undergoes slow maturation from a "young" to "mature" conformation. By optimizing the IPET imaging and electron dose conditions, we obtain 3D reconstructions of 120 individual particles at resolutions ranging from 23-35 Å, enabling us first-time to observe individual RNA helices and tertiary structures without averaging. Statistical analysis of 120 tertiary structures confirms the two main conformations and suggests a possible folding pathway driven by helix-helix compaction. Studies of the full conformational landscape reveal both trapped states, misfolded states, intermediate states, and fully compacted states. The study provides novel insight into RNA folding pathways and paves the way for future studies of the energy landscape of molecular machines and self-assembly processes.
Collapse
|
20
|
CryoFold 2.0: Cryo-EM Structure Determination with MELD. J Phys Chem A 2023; 127:3906-3913. [PMID: 37084537 DOI: 10.1021/acs.jpca.3c01731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2023]
Abstract
Cryo-electron microscopy data are becoming more prevalent and accessible at higher resolution levels, leading to the development of new computational tools to determine the atomic structure of macromolecules. However, while existing tools adapted from X-ray crystallography are suitable for the highest-resolution maps, new tools are needed for lower-resolution levels and to account for map heterogeneity. In this article, we introduce CryoFold 2.0, an integrative physics-based approach that combines Bayesian inference and the ability to handle multiple data sources with the molecular dynamics flexible fitting (MDFF) approach to determine the structures of macromolecules by using cryo-EM data. CryoFold 2.0 is incorporated into the MELD (modeling employing limited data) plugin, resulting in a pipeline that is more computationally efficient and accurate than running MELD or MDFF alone. The approach requires fewer computational resources and shorter simulation times than the original CryoFold, and it minimizes manual intervention. We demonstrate the effectiveness of the approach on eight different systems, highlighting its various benefits.
Collapse
|
21
|
Fast and automated protein-DNA/RNA macromolecular complex modeling from cryo-EM maps. Brief Bioinform 2023; 24:bbac632. [PMID: 36682003 PMCID: PMC10399284 DOI: 10.1093/bib/bbac632] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 01/23/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) allows a macromolecular structure such as protein-DNA/RNA complexes to be reconstructed in a three-dimensional coulomb potential map. The structural information of these macromolecular complexes forms the foundation for understanding the molecular mechanism including many human diseases. However, the model building of large macromolecular complexes is often difficult and time-consuming. We recently developed DeepTracer-2.0, an artificial-intelligence-based pipeline that can build amino acid and nucleic acid backbones from a single cryo-EM map, and even predict the best-fitting residues according to the density of side chains. The experiments showed improved accuracy and efficiency when benchmarking the performance on independent experimental maps of protein-DNA/RNA complexes and demonstrated the promising future of macromolecular modeling from cryo-EM maps. Our method and pipeline could benefit researchers worldwide who work in molecular biomedicine and drug discovery, and substantially increase the throughput of the cryo-EM model building. The pipeline has been integrated into the web portal https://deeptracer.uw.edu/.
Collapse
|
22
|
Smart de novo Macromolecular Structure Modeling from Cryo-EM Maps. J Mol Biol 2023; 435:167967. [PMID: 36681181 DOI: 10.1016/j.jmb.2023.167967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/04/2023] [Accepted: 01/12/2023] [Indexed: 01/20/2023]
Abstract
The study of macromolecular structures has expanded our understanding of the amazing cell machinery and such knowledge has changed how the pharmaceutical industry develops new vaccines in recent years. Traditionally, X-ray crystallography has been the main method for structure determination, however, cryogenic electron microscopy (cryo-EM) has increasingly become more popular due to recent advancements in hardware and software. The number of cryo-EM maps deposited in the EMDataResource (formerly EMDatabase) since 2002 has been dramatically increasing and it continues to do so. De novo macromolecular complex modeling is a labor-intensive process, therefore, it is highly desirable to develop software that can automate this process. Here we discuss our automated, data-driven, and artificial intelligence approaches including map processing, feature extraction, modeling building, and target identification. Recently, we have enabled DNA/RNA modeling in our deep learning-based prediction tool, DeepTracer. We have also developed DeepTracer-ID, a tool that can identify proteins solely based on the cryo-EM map. In this paper, we will present our accumulated experiences in developing deep learning-based methods surrounding macromolecule modeling applications.
Collapse
|
23
|
Near-Atomic Resolution Cryo-EM Image Reconstruction of RNA. Methods Mol Biol 2023; 2568:179-192. [PMID: 36227569 DOI: 10.1007/978-1-0716-2687-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The rapid development of cryogenic electron microscopy (cryo-EM) enables the structure determination of macromolecules without the need for crystallization. Protein, protein-lipid, and protein-nucleic acid complexes can now be routinely resolved by cryo-EM single-particle analysis (SPA) to near-atomic or atomic resolution. Here we describe the structure determination of pure RNAs by SPA, from cryo-specimen preparation to data collection and 3D reconstruction. This protocol is useful to yield many cryo-EM structures of RNA, here exemplified by the Tetrahymena L-21 ScaI ribozyme at 3.1-Å resolution.
Collapse
|
24
|
Protein model refinement for cryo-EM maps using AlphaFold2 and the DAQ score. Acta Crystallogr D Struct Biol 2023; 79:10-21. [PMID: 36601803 PMCID: PMC9815095 DOI: 10.1107/s2059798322011676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
As more protein structure models have been determined from cryogenic electron microscopy (cryo-EM) density maps, establishing how to evaluate the model accuracy and how to correct models in cases where they contain errors is becoming crucial to ensure the quality of the structural models deposited in the public database, the PDB. Here, a new protocol is presented for evaluating a protein model built from a cryo-EM map and applying local structure refinement in the case where the model has potential errors. Firstly, model evaluation is performed using a deep-learning-based model-local map assessment score, DAQ, that has recently been developed. The subsequent local refinement is performed by a modified AlphaFold2 procedure, in which a trimmed template model and a trimmed multiple sequence alignment are provided as input to control which structure regions to refine while leaving other more confident regions of the model intact. A benchmark study showed that this protocol, DAQ-refine, consistently improves low-quality regions of the initial models. Among 18 refined models generated for an initial structure, DAQ shows a high correlation with model quality and can identify the best accurate model for most of the tested cases. The improvements obtained by DAQ-refine were on average larger than other existing methods.
Collapse
|
25
|
Electron microscopy holdings of the Protein Data Bank: the impact of the resolution revolution, new validation tools, and implications for the future. Biophys Rev 2022; 14:1281-1301. [PMID: 36474933 PMCID: PMC9715422 DOI: 10.1007/s12551-022-01013-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/06/2022] [Indexed: 12/04/2022] Open
Abstract
As a discipline, structural biology has been transformed by the three-dimensional electron microscopy (3DEM) "Resolution Revolution" made possible by convergence of robust cryo-preservation of vitrified biological materials, sample handling systems, and measurement stages operating a liquid nitrogen temperature, improvements in electron optics that preserve phase information at the atomic level, direct electron detectors (DEDs), high-speed computing with graphics processing units, and rapid advances in data acquisition and processing software. 3DEM structure information (atomic coordinates and related metadata) are archived in the open-access Protein Data Bank (PDB), which currently holds more than 11,000 3DEM structures of proteins and nucleic acids, and their complexes with one another and small-molecule ligands (~ 6% of the archive). Underlying experimental data (3DEM density maps and related metadata) are stored in the Electron Microscopy Data Bank (EMDB), which currently holds more than 21,000 3DEM density maps. After describing the history of the PDB and the Worldwide Protein Data Bank (wwPDB) partnership, which jointly manages both the PDB and EMDB archives, this review examines the origins of the resolution revolution and analyzes its impact on structural biology viewed through the lens of PDB holdings. Six areas of focus exemplifying the impact of 3DEM across the biosciences are discussed in detail (icosahedral viruses, ribosomes, integral membrane proteins, SARS-CoV-2 spike proteins, cryogenic electron tomography, and integrative structure determination combining 3DEM with complementary biophysical measurement techniques), followed by a review of 3DEM structure validation by the wwPDB that underscores the importance of community engagement.
Collapse
|
26
|
Using deep-learning predictions of inter-residue distances for model validation. Acta Crystallogr D Struct Biol 2022; 78:1412-1427. [PMID: 36458613 PMCID: PMC9716559 DOI: 10.1107/s2059798322010415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Determination of protein structures typically entails building a model that satisfies the collected experimental observations and its deposition in the Protein Data Bank. Experimental limitations can lead to unavoidable uncertainties during the process of model building, which result in the introduction of errors into the deposited model. Many metrics are available for model validation, but most are limited to consideration of the physico-chemical aspects of the model or its match to the experimental data. The latest advances in the field of deep learning have enabled the increasingly accurate prediction of inter-residue distances, an advance which has played a pivotal role in the recent improvements observed in the field of protein ab initio modelling. Here, new validation methods are presented based on the use of these precise inter-residue distance predictions, which are compared with the distances observed in the protein model. Sequence-register errors are particularly clearly detected and the register shifts required for their correction can be reliably determined. The method is available in the ConKit package (https://www.conkit.org).
Collapse
|
27
|
Electron diffraction of 1,4-dichlorobenzene embedded in superfluid helium droplets. Phys Chem Chem Phys 2022; 24:27722-27730. [PMID: 36377553 PMCID: PMC9731815 DOI: 10.1039/d2cp04492g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
We perform electron diffraction of 1,4-dichlorobenzene (C6H4Cl2, referred to as 2ClB) embedded in superfluid helium droplets to investigate the structure evolution of cluster growth. Multivariable linear regression fittings are used to determine the concentration and the best model structures of the clusters. At a droplet source temperature of 22 K with droplets containing on average 5000 He atoms, the fitting results agree with the doping statistics modeled using the Poisson distribution: the largest molecular clusters are tetramers, while the abundances of monomers and dimers are the highest and are similar. Molecular dimers of 2ClB are determined to have a parallel structure with a 60° rotation for the Cl-Cl molecular axes. However, a better agreement between experiment and fitting is obtained by reducing the interlayer distance that had been calculated using the density functional theory for dimers. Further calculations using the highest level quantum mechanical calculations prove that the reduction in interlayer distance does not significantly increase the energy of the dimer. Cluster trimers adopt a dimer structure with the additional monomer slanted against the dimer, and tetramers take on a stacked structure. The structure evolution with cluster size is extraordinary, because from trimer to tetramer, one monomer needs to be rearranged, and neither the trimer nor the tetramer adopts the corresponding global minimum structure obtained using high level coupled-cluster theory calculations. This phenomenon may be related to the fast cooling process in superfluid helium droplets during cluster formation.
Collapse
|
28
|
Integrating model simulation tools and
cryo‐electron
microscopy. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
29
|
Image processing tools for the validation of CryoEM maps. Faraday Discuss 2022; 240:210-227. [PMID: 35861059 DOI: 10.1039/d2fd00059h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The number of maps deposited in public databases (Electron Microscopy Data Bank, EMDB) determined by cryo-electron microscopy has quickly grown in recent years. With this rapid growth, it is critical to guarantee their quality. So far, map validation has primarily focused on the agreement between maps and models. From the image processing perspective, the validation has been mostly restricted to using two half-maps and the measurement of their internal consistency. In this article, we suggest that map validation can be taken much further from the point of view of image processing if 2D classes, particles, angles, coordinates, defoci, and micrographs are also provided. We present a progressive validation scheme that qualifies a result validation status from 0 to 5 and offers three optional qualifiers (A, W, and O) that can be added. The simplest validation state is 0, while the most complete would be 5AWO. This scheme has been implemented in a website https://biocomp.cnb.csic.es/EMValidationService/ to which reconstructed maps and their ESI can be uploaded.
Collapse
|
30
|
Overview and applications of map and model validation tools in the CCP-EM software suite. Faraday Discuss 2022; 240:196-209. [PMID: 35916020 PMCID: PMC9642004 DOI: 10.1039/d2fd00103a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Cryogenic electron microscopy (cryo-EM) has recently been established as a powerful technique for solving macromolecular structures. Although the best resolutions achievable are improving, a significant majority of data are still resolved at resolutions worse than 3 Å, where it is non-trivial to build or fit atomic models. The map reconstructions and atomic models derived from the maps are also prone to errors accumulated through the different stages of data processing. Here, we highlight the need to evaluate both model geometry and fit to data at different resolutions. Assessment of cryo-EM structures from SARS-CoV-2 highlights a bias towards optimising the model geometry to agree with the most common conformations, compared to the agreement with data. We present the CoVal web service which provides multiple validation metrics to reflect the quality of atomic models derived from cryo-EM data of structures from SARS-CoV-2. We demonstrate that further refinement can lead to improvement of the agreement with data without the loss of geometric quality. We also discuss the recent CCP-EM developments aimed at addressing some of the current shortcomings.
Collapse
|
31
|
Towards rational computational peptide design. FRONTIERS IN BIOINFORMATICS 2022; 2:1046493. [PMID: 36338806 PMCID: PMC9634169 DOI: 10.3389/fbinf.2022.1046493] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022] Open
Abstract
Peptides are prevalent in biology, mediating as many as 40% of protein-protein interactions, and involved in other cellular functions such as transport and signaling. Their ability to bind with high specificity make them promising therapeutical agents with intermediate properties between small molecules and large biologics. Beyond their biological role, peptides can be programmed to self-assembly, and they are already being used for functions as diverse as oligonuclotide delivery, tissue regeneration or as drugs. However, the transient nature of their interactions has limited the number of structures and knowledge of binding affinities available-and their flexible nature has limited the success of computational pipelines that predict the structures and affinities of these molecules. Fortunately, recent advances in experimental and computational pipelines are creating new opportunities for this field. We are starting to see promising predictions of complex structures, thermodynamic and kinetic properties. We believe in the following years this will lead to robust rational peptide design pipelines with success similar to those applied for small molecule drug discovery.
Collapse
|
32
|
Residue-wise local quality estimation for protein models from cryo-EM maps. Nat Methods 2022; 19:1116-1125. [PMID: 35953671 PMCID: PMC10024464 DOI: 10.1038/s41592-022-01574-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 07/11/2022] [Indexed: 01/31/2023]
Abstract
An increasing number of protein structures are being determined by cryogenic electron microscopy (cryo-EM). Although the resolution of determined cryo-EM density maps is improving in general, there are still many cases where amino acids of a protein are assigned with different levels of confidence. Here we developed a method that identifies potential misassignment of residues in the map, including residue shifts along an otherwise correct main-chain trace. The score, named DAQ, computes the likelihood that the local density corresponds to different amino acids, atoms, and secondary structures, estimated via deep learning, and assesses the consistency of the amino acid assignment in the protein structure model with that likelihood. When DAQ was applied to different versions of model structures in the Protein Data Bank that were derived from the same density maps, a clear improvement in the DAQ score was observed in the newer versions of the models. DAQ also found potential misassignment errors in a substantial number of deposited protein structure models built into cryo-EM maps.
Collapse
|
33
|
AlphaFold2 and CryoEM: Revisiting CryoEM modeling in near-atomic resolution density maps. iScience 2022; 25:104496. [PMID: 35733789 PMCID: PMC9207676 DOI: 10.1016/j.isci.2022.104496] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/07/2022] [Accepted: 05/24/2022] [Indexed: 11/27/2022] Open
Abstract
With the advent of new artificial intelligence and machine learning algorithms, predictive modeling can, in some cases, produce structures on par with experimental methods. The combination of predictive modeling and experimental structure determination by electron cryomicroscopy (cryoEM) offers a tantalizing approach for producing robust atomic models of macromolecular assemblies. Here, we apply AlphaFold2 to a set of community standard data sets and compare the results with the corresponding reference maps and models. Moreover, we present three unique case studies from previously determined cryoEM density maps of viruses. Our results show that AlphaFold2 can not only produce reasonably accurate models for analysis and additional hypotheses testing, but can also potentially yield incorrect structures if not properly validated with experimental data. Whereas we outline numerous shortcomings and potential pitfalls of predictive modeling, the obvious synergy between predictive modeling and cryoEM will undoubtedly result in new computational modeling tools.
Collapse
|
34
|
Model building of protein complexes from intermediate-resolution cryo-EM maps with deep learning-guided automatic assembly. Nat Commun 2022; 13:4066. [PMID: 35831370 PMCID: PMC9279371 DOI: 10.1038/s41467-022-31748-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 06/30/2022] [Indexed: 12/29/2022] Open
Abstract
Advances in microscopy instruments and image processing algorithms have led to an increasing number of cryo-electron microscopy (cryo-EM) maps. However, building accurate models into intermediate-resolution EM maps remains challenging and labor-intensive. Here, we propose an automatic model building method of multi-chain protein complexes from intermediate-resolution cryo-EM maps, named EMBuild, by integrating AlphaFold structure prediction, FFT-based global fitting, domain-based semi-flexible refinement, and graph-based iterative assembling on the main-chain probability map predicted by a deep convolutional network. EMBuild is extensively evaluated on diverse test sets of 47 single-particle EM maps at 4.0-8.0 Å resolution and 16 subtomogram averaging maps of cryo-ET data at 3.7-9.3 Å resolution, and compared with state-of-the-art approaches. We demonstrate that EMBuild is able to build high-quality complex structures that are comparably accurate to the manually built PDB structures from the cryo-EM maps. These results demonstrate the accuracy and reliability of EMBuild in automatic model building.
Collapse
|
35
|
Abstract
Cryo-electron microscopy (cryo-EM) continues its remarkable growth as a method for visualizing biological objects, which has been driven by advances across the entire pipeline. Developments in both single-particle analysis and in situ tomography have enabled more structures to be imaged and determined to better resolutions, at faster speeds, and with more scientists having improved access. This review highlights recent advances at each stageof the cryo-EM pipeline and provides examples of how these techniques have been used to investigate real-world problems, including antibody development against the SARS-CoV-2 spike during the recent COVID-19 pandemic.
Collapse
|
36
|
Sequence-assignment validation in cryo-EM models with checkMySequence. ACTA CRYSTALLOGRAPHICA SECTION D STRUCTURAL BIOLOGY 2022; 78:806-816. [PMID: 35775980 PMCID: PMC9248842 DOI: 10.1107/s2059798322005009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 05/10/2022] [Indexed: 01/18/2023]
Abstract
A new method, checkMySequence, for the fast and automated detection of register errors in protein models built into cryo-EM reconstructions is presented. The availability of new artificial intelligence-based protein-structure-prediction tools has radically changed the way that cryo-EM maps are interpreted, but it has not eliminated the challenges of map interpretation faced by a microscopist. Models will continue to be locally rebuilt and refined using interactive tools. This inevitably results in occasional errors, among which register shifts remain one of the most difficult to identify and correct. Here, checkMySequence, a fast, fully automated and parameter-free method for detecting register shifts in protein models built into cryo-EM maps, is introduced. It is shown that the method can assist model building in cases where poorer map resolution hinders visual interpretation. It is also shown that checkMySequence could have helped to avoid a widely discussed sequence-register error in a model of SARS-CoV-2 RNA-dependent RNA polymerase that was originally detected thanks to a visual residue-by-residue inspection by members of the structural biology community. The software is freely available at https://gitlab.com/gchojnowski/checkmysequence.
Collapse
|
37
|
Beyond the Backbone: The Next Generation of Pathwalking Utilities for Model Building in CryoEM Density Maps. Biomolecules 2022; 12:773. [PMID: 35740898 PMCID: PMC9220806 DOI: 10.3390/biom12060773] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 05/25/2022] [Accepted: 05/30/2022] [Indexed: 01/18/2023] Open
Abstract
Single-particle electron cryomicroscopy (cryoEM) has become an indispensable tool for studying structure and function in macromolecular assemblies. As an integral part of the cryoEM structure determination process, computational tools have been developed to build atomic models directly from a density map without structural templates. Nearly a decade ago, we created Pathwalking, a tool for de novo modeling of protein structure in near-atomic resolution cryoEM density maps. Here, we present the latest developments in Pathwalking, including the addition of probabilistic models, as well as a companion tool for modeling waters and ligands. This software was evaluated on the 2021 CryoEM Ligand Challenge density maps, in addition to identifying ligands in three IP3R1 density maps at ~3 Å to 4.1 Å resolution. The results clearly demonstrate that the Pathwalking de novo modeling pipeline can construct accurate protein structures and reliably localize and identify ligand density directly from a near-atomic resolution map.
Collapse
|
38
|
Abstract
![]()
Adeno-associated
virus (AAV) has a single-stranded DNA genome encapsidated
in a small icosahedrally symmetric protein shell with 60 subunits.
AAV is the leading delivery vector in emerging gene therapy treatments
for inherited disorders, so its structure and molecular interactions
with human hosts are of intense interest. A wide array of electron
microscopic approaches have been used to visualize the virus and its
complexes, depending on the scientific question, technology available,
and amenability of the sample. Approaches range from subvolume tomographic
analyses of complexes with large and flexible host proteins to detailed
analysis of atomic interactions within the virus and with small ligands
at resolutions as high as 1.6 Å. Analyses have led to the reclassification
of glycan receptors as attachment factors, to structures with a new-found
receptor protein, to identification of the epitopes of antibodies,
and a new understanding of possible neutralization mechanisms. AAV
is now well-enough characterized that it has also become a model system
for EM methods development. Heralding a new era, cryo-EM is now also
being deployed as an analytic tool in the process development and
production quality control of high value pharmaceutical biologics,
namely AAV vectors.
Collapse
|
39
|
Abstract
The Electron Microscopy Data Bank (EMDB) is the central archive of the electron cryo-microscopy (cryo-EM) community for storing and disseminating volume maps and tomograms. With input from the community, EMDB has developed new resources for the validation of cryo-EM structures, focusing on the quality of the volume data alone and that of the fit of any models, themselves archived in the Protein Data Bank (PDB), to the volume data. Based on recommendations from community experts, the validation resources are developed in a three-tiered system. Tier 1 covers an extensive and evolving set of validation metrics, including tried and tested metrics as well as more experimental ones, which are calculated for all EMDB entries and presented in the Validation Analysis (VA) web resource. This system is particularly useful for cryo-EM experts, both to validate individual structures and to assess the utility of new validation metrics. Tier 2 comprises a subset of the validation metrics covered by the VA resource that have been subjected to extensive testing and are considered to be useful for specialists as well as nonspecialists. These metrics are presented on the entry-specific web pages for the entire archive on the EMDB website. As more experience is gained with the metrics included in the VA resource, it is expected that consensus will emerge in the community regarding a subset that is suitable for inclusion in the tier 2 system. Tier 3, finally, consists of the validation reports and servers that are produced by the Worldwide Protein Data Bank (wwPDB) Consortium. Successful metrics from tier 2 will be proposed for inclusion in the wwPDB validation pipeline and reports. The details of the new resource are described, with an emphasis on the tier 1 system. The output of all three tiers is publicly available, either through the EMDB website (tiers 1 and 2) or through the wwPDB ftp sites (tier 3), although the content of all three will evolve over time (fastest for tier 1 and slowest for tier 3). It is our hope that these validation resources will help the cryo-EM community to obtain a better understanding of the quality and of the best ways to assess the quality of cryo-EM structures in EMDB and PDB.
Collapse
|
40
|
Abstract
In-cell structural biology aims at extracting structural information about proteins or nucleic acids in their native, cellular environment. This emerging field holds great promise and is already providing new facts and outlooks of interest at both fundamental and applied levels. NMR spectroscopy has important contributions on this stage: It brings information on a broad variety of nuclei at the atomic scale, which ensures its great versatility and uniqueness. Here, we detail the methods, the fundamental knowledge, and the applications in biomedical engineering related to in-cell structural biology by NMR. We finally propose a brief overview of the main other techniques in the field (EPR, smFRET, cryo-ET, etc.) to draw some advisable developments for in-cell NMR. In the era of large-scale screenings and deep learning, both accurate and qualitative experimental evidence are as essential as ever to understand the interior life of cells. In-cell structural biology by NMR spectroscopy can generate such a knowledge, and it does so at the atomic scale. This review is meant to deliver comprehensive but accessible information, with advanced technical details and reflections on the methods, the nature of the results, and the future of the field.
Collapse
|
41
|
Electron diffraction as a structure tool for charged and neutral nanoclusters formed in superfluid helium droplets. Phys Chem Chem Phys 2022; 24:6349-6362. [PMID: 35257134 PMCID: PMC10508180 DOI: 10.1039/d2cp00048b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This perspective presents the current status and future directions in using electron diffraction to determine the structures of clusters formed in superfluid helium droplets. The details of the experimental setup and data treatment procedures are explained, and several examples are illustrated. The ease of forming atomic and molecular clusters has been recognized since the invention of superfluid helium droplet beams. To resolve atomic structures from clusters formed in droplets, substantial efforts have been devoted to minimizing the contribution of helium to diffraction signals. With active background subtraction, we have obtained structures from clusters containing a few to more than 10 monomers, with and without heavy atoms to assist with the diffraction intensity, for both neutral and ionic species. From fittings of the diffraction profiles using model structures, we have observed that some small clusters adopt the structures of the corresponding solid sample, even for dimers such as iodine and pyrene, while others require trimers or tetramers to reach the structural motif of bulk solids, and smaller clusters such as CS2 dimers adopt gas phase structures. Cationic clusters of argon clusters contain an Ar3+ core, while pyrene dimers demonstrate a change in the intermolecular distance, from 3.5 Å for neutral dimers to 3.0 Å for cations. Future improvements in reducing the background of helium, and in expanding the information content of electron diffraction such as detection of charge distributions, are also discussed.
Collapse
|
42
|
Exploring cryo-electron microscopy with molecular dynamics. Biochem Soc Trans 2022; 50:569-581. [PMID: 35212361 DOI: 10.1042/bst20210485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/31/2022] [Accepted: 02/02/2022] [Indexed: 11/17/2022]
Abstract
Single particle analysis cryo-electron microscopy (EM) and molecular dynamics (MD) have been complimentary methods since cryo-EM was first applied to the field of structural biology. The relationship started by biasing structural models to fit low-resolution cryo-EM maps of large macromolecular complexes not amenable to crystallization. The connection between cryo-EM and MD evolved as cryo-EM maps improved in resolution, allowing advanced sampling algorithms to simultaneously refine backbone and sidechains. Moving beyond a single static snapshot, modern inferencing approaches integrate cryo-EM and MD to generate structural ensembles from cryo-EM map data or directly from the particle images themselves. We summarize the recent history of MD innovations in the area of cryo-EM modeling. The merits for the myriad of MD based cryo-EM modeling methods are discussed, as well as, the discoveries that were made possible by the integration of molecular modeling with cryo-EM. Lastly, current challenges and potential opportunities are reviewed.
Collapse
|
43
|
Resolving the interlayer distance of cationic pyrene clusters embedded in superfluid helium droplets using electron diffraction. J Chem Phys 2022; 156:051101. [DOI: 10.1063/5.0080365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
44
|
Simplified quality assessment for small-molecule ligands in the Protein Data Bank. Structure 2022; 30:252-262.e4. [PMID: 35026162 PMCID: PMC8849442 DOI: 10.1016/j.str.2021.10.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/14/2021] [Accepted: 10/06/2021] [Indexed: 02/05/2023]
Abstract
More than 70% of the experimentally determined macromolecular structures in the Protein Data Bank (PDB) contain small-molecule ligands. Quality indicators of ∼643,000 ligands present in ∼106,000 PDB X-ray crystal structures have been analyzed. Ligand quality varies greatly with regard to goodness of fit between ligand structure and experimental data, deviations in bond lengths and angles from known chemical structures, and inappropriate interatomic clashes between the ligand and its surroundings. Based on principal component analysis, correlated quality indicators of ligand structure have been aggregated into two largely orthogonal composite indicators measuring goodness of fit to experimental data and deviation from ideal chemical structure. Ranking of the composite quality indicators across the PDB archive enabled construction of uniformly distributed composite ranking score. This score is implemented at RCSB.org to compare chemically identical ligands in distinct PDB structures with easy-to-interpret two-dimensional ligand quality plots, allowing PDB users to quickly assess ligand structure quality and select the best exemplars.
Collapse
|
45
|
Atomic model validation using the CCP-EM software suite. Acta Crystallogr D Struct Biol 2022; 78:152-161. [PMID: 35102881 PMCID: PMC8805302 DOI: 10.1107/s205979832101278x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 12/01/2021] [Indexed: 12/02/2022] Open
Abstract
Recently, there has been a dramatic improvement in the quality and quantity of data derived using cryogenic electron microscopy (cryo-EM). This is also associated with a large increase in the number of atomic models built. Although the best resolutions that are achievable are improving, often the local resolution is variable, and a significant majority of data are still resolved at resolutions worse than 3 Å. Model building and refinement is often challenging at these resolutions, and hence atomic model validation becomes even more crucial to identify less reliable regions of the model. Here, a graphical user interface for atomic model validation, implemented in the CCP-EM software suite, is presented. It is aimed to develop this into a platform where users can access multiple complementary validation metrics that work across a range of resolutions and obtain a summary of evaluations. Based on the validation estimates from atomic models associated with cryo-EM structures from SARS-CoV-2, it was observed that models typically favor adopting the most common conformations over fitting the observations when compared with the model agreement with data. At low resolutions, the stereochemical quality may be favored over data fit, but care should be taken to ensure that the model agrees with the data in terms of resolvable features. It is demonstrated that further re-refinement can lead to improvement of the agreement with data without the loss of geometric quality. This also highlights the need for improved resolution-dependent weight optimization in model refinement and an effective test for overfitting that would help to guide the refinement process.
Collapse
|
46
|
Low Temperature Plasma for Biology, Hygiene, and Medicine: Perspective and Roadmap. IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES 2022. [DOI: 10.1109/trpms.2021.3135118] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
47
|
findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM. IUCRJ 2022; 9:86-97. [PMID: 35059213 PMCID: PMC8733886 DOI: 10.1107/s2052252521011088] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 10/22/2021] [Indexed: 05/15/2023]
Abstract
Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.
Collapse
|
48
|
Cryo-EM in molecular and cellular biology. Mol Cell 2022; 82:274-284. [DOI: 10.1016/j.molcel.2021.12.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/13/2021] [Accepted: 12/14/2021] [Indexed: 11/16/2022]
|
49
|
Abstract
Structures of seven CASP14 targets were determined using cryo-electron microscopy (cryo-EM) technique with resolution between 2.1 and 3.8 Å. We provide an evaluation of the submitted models versus the experimental data (cryo-EM density maps) and experimental reference structures built into the maps. The accuracy of models is measured in terms of coordinate-to-density and coordinate-to-coordinate fit. A-posteriori refinement of the most accurate models in their corresponding cryo-EM density resulted in structures that are close to the reference structure, including some regions with better fit to the density. Regions that were found to be less "refineable" correlate well with regions of high diversity between the CASP models and low goodness-of-fit to density in the reference structure.
Collapse
|
50
|
High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS 2021; 2021:46-57. [PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Collapse
|