1
|
Chen Q, Xu Z, Dai H, Shen Y, Zhang J, Liu Z, Pei Y, Yu J. A large-scale curated and filterable dataset for cryo-EM foundation model pre-training. Sci Data 2025; 12:960. [PMID: 40483273 PMCID: PMC12145456 DOI: 10.1038/s41597-025-05179-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 05/09/2025] [Indexed: 06/11/2025] Open
Abstract
Cryo-electron microscopy (cryo-EM) is a transformative imaging technology that enables near-atomic resolution 3D reconstruction of target biomolecule, playing a critical role in structural biology and drug discovery. Cryo-EM faces significant challenges due to its extremely low signal-to-noise ratio (SNR) where the complexity of data processing becomes particularly pronounced. To address this challenge, foundation models have shown great potential in other biological imaging domains. However, their application in cryo-EM has been limited by the lack of large-scale, high-quality datasets. To fill this gap, we introduce CryoCRAB, the first large-scale dataset for cryo-EM foundation models. CryoCRAB includes 746 proteins, comprising 152,385 sets of raw movie frames (116.8 TB in total). To tackle the high-noise nature of cryo-EM data, each movie is split into odd and even frames to generate paired micrographs for denoising tasks. The dataset is stored in HDF5 chunked format, significantly improving random sampling efficiency and training speed. CryoCRAB offers diverse data support for cryo-EM foundation models, enabling advancements in image denoising and general-purpose feature extraction for downstream tasks.
Collapse
Affiliation(s)
- Qihe Chen
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Zhenyang Xu
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Haizhao Dai
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Yingjun Shen
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Jiakai Zhang
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Zhijie Liu
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
| | - Yuan Pei
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
| | - Jingyi Yu
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
2
|
Neiterman EH, Heimowitz A, Ben-Artzi G. A non-parametric approach to particle picking in all frames. J Struct Biol 2025; 217:108201. [PMID: 40334801 DOI: 10.1016/j.jsb.2025.108201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 04/09/2025] [Accepted: 04/10/2025] [Indexed: 05/09/2025]
Abstract
Single-particle cryo-electron microscopy (cryo-EM) has significantly advanced macromolecular structure reconstruction. However, a key limitation is the conventional reliance on micrographs obtained by motion correction and averaging, which inherently loses the richness of information contained within each frame of the original movie. The future of cryo-EM reconstruction ideally involves harnessing the raw signal from every frame to unlock potentially higher quality structures. In this paper, we present a first essential step toward this paradigm shift, that is, a novel, non-parametric method for detecting tomographic projections across all movie frames, using temporal consistency. Our method is inspired by Structure-from-Motion (SfM), and independent of motion correction, CTF estimation, and initial reconstruction. Our experimental results demonstrate reduced outlier rate and accurate particle localization comparable to existing approaches throughout the entire movie sequence.
Collapse
Affiliation(s)
| | - Ayelet Heimowitz
- Department of Electronics and Electrical Engineering, Ariel University, Ariel, Israel.
| | - Gil Ben-Artzi
- School of Computer Science, Ariel University, Ariel, Israel.
| |
Collapse
|
3
|
Fadeeva M, Klaiman D, Caspy I, Nelson N. CryoEM PSII structure reveals adaptation mechanisms to environmental stress in Chlorella ohadii. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539358. [PMID: 37205566 PMCID: PMC10187303 DOI: 10.1101/2023.05.04.539358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Performing photosynthesis in the desert is a challenging task since it requires a fast adaptation to extreme illumination and temperature changes. To understand adaptive mechanisms, we purified Photosystem II (PSII) from Chlorella ohadii , a green alga from the desert soil surface, and identified structural elements that might enable the photosystem functioning under harsh conditions. The 2.72 Å cryogenic electron-microscopy (cryoEM) structure of PSII exhibited 64 subunits, encompassing 386 chlorophylls, 86 carotenoids, four plastoquinones, and several structural lipids. At the luminal side of PSII, the oxygen evolving complex was protected by a unique subunit arrangement - PsbO (OEE1), PsbP (OEE2), CP47, and PsbU (plant OEE3 homolog). PsbU interacted with PsbO, CP43, and PsbP, thus stabilising the oxygen evolving shield. Substantial changes were observed on the stromal electron acceptor side - PsbY was identified as a transmembrane helix situated alongside PsbF and PsbE enclosing cytochrome b559, supported by the adjacent C-terminal helix of Psb10. These four transmembrane helices bundled jointly, shielding cytochrome b559 from the solvent. The bulk of Psb10 formed a cap protecting the quinone site and probably contributed to the PSII stacking. So far, the C. ohadii PSII structure is the most complete description of the complex, suggesting numerous future experiments. A protective mechanism that prevented Q B from rendering itself fully reduced is proposed.
Collapse
Affiliation(s)
| | | | - Ido Caspy
- Department of Biochemistry and Molecular Biology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Nathan Nelson
- Department of Biochemistry and Molecular Biology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| |
Collapse
|
4
|
Bendory T, Lan TY, Marshall NF, Rukshin I, Singer A. MULTI-TARGET DETECTION WITH ROTATIONS. INVERSE PROBLEMS AND IMAGING (SPRINGFIELD, MO.) 2023; 17:362-380. [PMID: 39175756 PMCID: PMC11340853 DOI: 10.3934/ipi.2022046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
We consider the multi-target detection problem of estimating a two-dimensional target image from a large noisy measurement image that contains many randomly rotated and translated copies of the target image. Motivated by single-particle cryo-electron microscopy, we focus on the low signal-to-noise regime, where it is difficult to estimate the locations and orientations of the target images in the measurement. Our approach uses autocorrelation analysis to estimate rotationally and translationally invariant features of the target image. We demonstrate that, regardless of the level of noise, our technique can be used to recover the target image when the measurement is sufficiently large.
Collapse
Affiliation(s)
- Tamir Bendory
- School of Electrical Engineering, Tel Aviv University, Israel
| | - Ti-Yen Lan
- Program in Applied and Computational Mathematics, Princeton University, USA
| | | | - Iris Rukshin
- Program in Applied and Computational Mathematics, Princeton University, USA
| | - Amit Singer
- Program in Applied and Computational Mathematics and the Department of Mathematics, Princeton University, USA
| |
Collapse
|
5
|
Roth M, Painsky A, Bendory T. Detecting Non-Overlapping Signals with Dynamic Programming. ENTROPY (BASEL, SWITZERLAND) 2023; 25:250. [PMID: 36832618 PMCID: PMC9955077 DOI: 10.3390/e25020250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 01/23/2023] [Accepted: 01/27/2023] [Indexed: 06/18/2023]
Abstract
This paper studies the classical problem of detecting the locations of signal occurrences in a one-dimensional noisy measurement. Assuming the signal occurrences do not overlap, we formulate the detection task as a constrained likelihood optimization problem and design a computationally efficient dynamic program that attains its optimal solution. Our proposed framework is scalable, simple to implement, and robust to model uncertainties. We show by extensive numerical experiments that our algorithm accurately estimates the locations in dense and noisy environments, and outperforms alternative methods.
Collapse
Affiliation(s)
- Mordechai Roth
- School of Electrical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Amichai Painsky
- The Industrial Engineering Department, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Tamir Bendory
- School of Electrical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
6
|
Bendory T, Boumal N, Leeb W, Levin E, Singer A. Toward Single Particle Reconstruction without Particle Picking: Breaking the Detection Limit. SIAM JOURNAL ON IMAGING SCIENCES 2023; 16:886-910. [PMID: 39144526 PMCID: PMC11324246 DOI: 10.1137/22m1503828] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/16/2024]
Abstract
Single-particle cryo-electron microscopy (cryo-EM) has recently joined X-ray crystallography and NMR spectroscopy as a high-resolution structural method to resolve biological macromolecules. In a cryo-EM experiment, the microscope produces images called micrographs. Projections of the molecule of interest are embedded in the micrographs at unknown locations, and under unknown viewing directions. Standard imaging techniques first locate these projections (detection) and then reconstruct the 3-D structure from them. Unfortunately, high noise levels hinder detection. When reliable detection is rendered impossible, the standard techniques fail. This is a problem, especially for small molecules. In this paper, we pursue a radically different approach: we contend that the structure could, in principle, be reconstructed directly from the micrographs, without intermediate detection. The aim is to bring small molecules within reach for cryo-EM. To this end, we design an autocorrelation analysis technique that allows one to go directly from the micrographs to the sought structures. This involves only one pass over the micrographs, allowing online, streaming processing for large experiments. We show numerical results and discuss challenges that lay ahead to turn this proof-of-concept into a complementary approach to state-of-the-art algorithms.
Collapse
Affiliation(s)
- Tamir Bendory
- The School of Electrical Engineering, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nicolas Boumal
- Institute of Mathematics, Ecole Polytechnique Fédérale DE Lausanne EPFL, 1015 Lausanne, Switzerland
| | - William Leeb
- School of Mathematics, University of Minnesota, Minneapolis, MN 55455 USA
| | - Eitan Levin
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA 91125 USA
| | - Amit Singer
- The Program in Applied and Computational Mathematics and Department of Mathematics, Princeton University, Princeton, NJ 08544 USA
| |
Collapse
|
7
|
Eldar A, Amos I, Shkolnisky Y. ASOCEM: Automatic Segmentation Of Contaminations in cryo-EM. J Struct Biol 2022; 214:107871. [DOI: 10.1016/j.jsb.2022.107871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/10/2022] [Accepted: 05/17/2022] [Indexed: 11/25/2022]
|
8
|
Caspy I, Neumann E, Fadeeva M, Liveanu V, Savitsky A, Frank A, Kalisman YL, Shkolnisky Y, Murik O, Treves H, Hartmann V, Nowaczyk MM, Schuhmann W, Rögner M, Willner I, Kaplan A, Schuster G, Nelson N, Lubitz W, Nechushtai R. Cryo-EM photosystem I structure reveals adaptation mechanisms to extreme high light in Chlorella ohadii. NATURE PLANTS 2021; 7:1314-1322. [PMID: 34462576 DOI: 10.1038/s41477-021-00983-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 07/07/2021] [Indexed: 05/10/2023]
Abstract
Photosynthesis in deserts is challenging since it requires fast adaptation to rapid night-to-day changes, that is, from dawn's low light (LL) to extreme high light (HL) intensities during the daytime. To understand these adaptation mechanisms, we purified photosystem I (PSI) from Chlorella ohadii, a green alga that was isolated from a desert soil crust, and identified the essential functional and structural changes that enable the photosystem to perform photosynthesis under extreme high light conditions. The cryo-electron microscopy structures of PSI from cells grown under low light (PSILL) and high light (PSIHL), obtained at 2.70 and 2.71 Å, respectively, show that part of light-harvesting antenna complex I (LHCI) and the core complex subunit (PsaO) are eliminated from PSIHL to minimize the photodamage. An additional change is in the pigment composition and their number in LHCIHL; about 50% of chlorophyll b is replaced by chlorophyll a. This leads to higher electron transfer rates in PSIHL and might enable C. ohadii PSI to act as a natural photosynthesiser in photobiocatalytic systems. PSIHL or PSILL were attached to an electrode and their induced photocurrent was determined. To obtain photocurrents comparable with PSIHL, 25 times the amount of PSILL was required, demonstrating the high efficiency of PSIHL. Hence, we suggest that C. ohadii PSIHL is an ideal candidate for the design of desert artificial photobiocatalytic systems.
Collapse
Affiliation(s)
- Ido Caspy
- Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ehud Neumann
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Maria Fadeeva
- Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Varda Liveanu
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel
| | - Anton Savitsky
- Faculty of Physics, Technical University Dortmund, Dortmund, Germany
- Max Planck Institute for Chemical Energy Conversion, Mülheim an der Ruhr, Germany
| | - Anna Frank
- Plant Biochemistry, Faculty of Biology and Biotechnology, Ruhr University Bochum, Bochum, Germany
| | - Yael Levi Kalisman
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel
- The Centre for Nanoscience and Nanotechnology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yoel Shkolnisky
- School of Mathematical Sciences, Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Omer Murik
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Haim Treves
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Volker Hartmann
- Plant Biochemistry, Faculty of Biology and Biotechnology, Ruhr University Bochum, Bochum, Germany
| | - Marc M Nowaczyk
- Plant Biochemistry, Faculty of Biology and Biotechnology, Ruhr University Bochum, Bochum, Germany
| | - Wolfgang Schuhmann
- Analytical Chemistry-Centre for Electrochemical Sciences (CES), Faculty of Chemistry and Biochemistry, Ruhr University Bochum, Bochum, Germany
| | - Matthias Rögner
- Plant Biochemistry, Faculty of Biology and Biotechnology, Ruhr University Bochum, Bochum, Germany
| | - Itamar Willner
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Aaron Kaplan
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Gadi Schuster
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, Israel
| | - Nathan Nelson
- Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Wolfgang Lubitz
- Max Planck Institute for Chemical Energy Conversion, Mülheim an der Ruhr, Germany.
| | - Rachel Nechushtai
- Institute of Life Science, Faculty of Science and Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
9
|
Structure of plant photosystem I-plastocyanin complex reveals strong hydrophobic interactions. Biochem J 2021; 478:2371-2384. [PMID: 34085703 PMCID: PMC8238519 DOI: 10.1042/bcj20210267] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/28/2021] [Accepted: 06/03/2021] [Indexed: 11/17/2022]
Abstract
Photosystem I is defined as plastocyanin-ferredoxin oxidoreductase. Taking advantage of genetic engineering, kinetic analyses and cryo-EM, our data provide novel mechanistic insights into binding and electron transfer between PSI and Pc. Structural data at 2.74 Å resolution reveals strong hydrophobic interactions in the plant PSI-Pc ternary complex, leading to exclusion of water molecules from PsaA-PsaB/Pc interface once the PSI-Pc complex forms. Upon oxidation of Pc, a slight tilt of bound oxidized Pc allows water molecules to accommodate the space between Pc and PSI to drive Pc dissociation. Such a scenario is consistent with the six times larger dissociation constant of oxidized as compared with reduced Pc and mechanistically explains how this molecular machine optimized electron transfer for fast turnover.
Collapse
|