1
|
Chen Q, Xu Z, Dai H, Shen Y, Zhang J, Liu Z, Pei Y, Yu J. A large-scale curated and filterable dataset for cryo-EM foundation model pre-training. Sci Data 2025; 12:960. [PMID: 40483273 PMCID: PMC12145456 DOI: 10.1038/s41597-025-05179-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 05/09/2025] [Indexed: 06/11/2025] Open
Abstract
Cryo-electron microscopy (cryo-EM) is a transformative imaging technology that enables near-atomic resolution 3D reconstruction of target biomolecule, playing a critical role in structural biology and drug discovery. Cryo-EM faces significant challenges due to its extremely low signal-to-noise ratio (SNR) where the complexity of data processing becomes particularly pronounced. To address this challenge, foundation models have shown great potential in other biological imaging domains. However, their application in cryo-EM has been limited by the lack of large-scale, high-quality datasets. To fill this gap, we introduce CryoCRAB, the first large-scale dataset for cryo-EM foundation models. CryoCRAB includes 746 proteins, comprising 152,385 sets of raw movie frames (116.8 TB in total). To tackle the high-noise nature of cryo-EM data, each movie is split into odd and even frames to generate paired micrographs for denoising tasks. The dataset is stored in HDF5 chunked format, significantly improving random sampling efficiency and training speed. CryoCRAB offers diverse data support for cryo-EM foundation models, enabling advancements in image denoising and general-purpose feature extraction for downstream tasks.
Collapse
Affiliation(s)
- Qihe Chen
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Zhenyang Xu
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Haizhao Dai
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Yingjun Shen
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Jiakai Zhang
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Cellverse, Cellverse Co., Ltd., Shanghai, 201210, China
| | - Zhijie Liu
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
| | - Yuan Pei
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
| | - Jingyi Yu
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
2
|
Dhakal A, Gyawali R, Wang L, Cheng J. Artificial intelligence in cryo-EM protein particle picking: recent advances and remaining challenges. Brief Bioinform 2024; 26:bbaf011. [PMID: 39820248 PMCID: PMC11736895 DOI: 10.1093/bib/bbaf011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/18/2024] [Accepted: 01/06/2025] [Indexed: 01/19/2025] Open
Abstract
Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by enabling the determination of high-resolution 3-Dimensional (3D) structures of large biological macromolecules. Protein particle picking, the process of identifying individual protein particles in cryo-EM micrographs for building protein structures, has progressed from manual and template-based methods to sophisticated artificial intelligence (AI)-driven approaches in recent years. This review critically examines the evolution and current state of cryo-EM particle picking methods, with an emphasis on the impact of AI. We conducted a comparative evaluation of popular AI-based particle picking methods, using both general machine learning metrics and specific cryo-EM structure determination metrics. This analysis involved constructing the 3D density map from the picked protein particles and assessing the obtained resolution and particle orientation diversity, underscoring the significant impact of AI on cryo-EM particle picking. Despite the advancements, we also identified key obstacles, such as handling complex micrographs with small proteins. The analysis provides insights into the future development of more sophisticated and fully automated AI methods in cryo-EM particle recognition.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, Columbia, MO 65211, United States
| | - Rajan Gyawali
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, Columbia, MO 65211, United States
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, Columbia, MO 65211, United States
| |
Collapse
|
3
|
Cameron CJF, Seager SJH, Sigworth FJ, Tagare HD, Gerstein MB. REliable PIcking by Consensus (REPIC): a consensus methodology for harnessing multiple cryo-EM particle pickers. Commun Biol 2024; 7:1421. [PMID: 39482410 PMCID: PMC11528043 DOI: 10.1038/s42003-024-07045-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 10/10/2024] [Indexed: 11/03/2024] Open
Abstract
Cryo-EM particle identification from micrographs ("picking") is challenging due to the low signal-to-noise ratio and lack of ground truth for particle locations. State-of-the-art computational algorithms ("pickers") identify different particle sets, complicating the selection of the best-suited picker for a protein of interest. Here, we present REliable PIcking by Consensus (REPIC), a computational approach to identifying particles common to the output of multiple pickers. We frame consensus particle picking as a graph problem, which REPIC solves using integer linear programming. REPIC picks high-quality particles even when the best picker is not known a priori or a protein is difficult-to-pick (e.g., NOMPC ion channel). Reconstructions using consensus particles without particle filtering achieve resolutions comparable to those from particles picked by experts. Our results show that REPIC requires minimal (often no) manual intervention, and considerably reduces the burden on cryo-EM users for picker selection and particle picking. Availability: https://github.com/ccameron/REPIC .
Collapse
Affiliation(s)
- Christopher J F Cameron
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT, USA.
| | - Sebastian J H Seager
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Fred J Sigworth
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Department of Cellular and Molecular Physiology, Yale University, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
| | - Hemant D Tagare
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT, USA
- Department of Biomedical Engineering, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| | - Mark B Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
- Department of Computer Science, Yale University, New Haven, CT, USA.
- Department of Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
4
|
Vargas J, Modrego A, Canabal H, Martin-Benito J. Semantic segmentation-based detection algorithm for challenging cryo-electron microscopy RNP samples. Front Mol Biosci 2024; 11:1473609. [PMID: 39411403 PMCID: PMC11473350 DOI: 10.3389/fmolb.2024.1473609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/17/2024] [Indexed: 10/19/2024] Open
Abstract
In this study, we present a novel and robust methodology for the automatic detection of influenza A virus ribonucleoproteins (RNPs) in single-particle cryo-electron microscopy (cryo-EM) images. Utilizing a U-net architecture-a type of convolutional neural network renowned for its efficiency in biomedical image segmentation-our approach is based on a pretraining phase with a dataset annotated through visual inspection. This dataset facilitates the precise identification of filamentous RNPs, including the localization of the filaments and their terminal coordinates. A key feature of our method is the application of semantic segmentation techniques, enabling the automated categorization of micrograph pixels into distinct classifications of particle and background. This deep learning strategy allows to robustly detect these intricate particles, a crucial step in achieving high-resolution reconstructions in cryo-EM studies. To encourage collaborative advancements in the field, we have made our routines, the pretrained U-net model, and the training dataset publicly accessible. The reproducibility and accessibility of these resources aim to facilitate further research and validation in the realm of cryo-EM image analysis.
Collapse
Affiliation(s)
- J. Vargas
- Departamento de Óptica, Universidad Complutense de Madrid, Madrid, Spain
| | - A. Modrego
- Department of Macromolecular Structure, National Centre for Biotechnology, Madrid, Spain
| | - H. Canabal
- Departamento de Óptica, Universidad Complutense de Madrid, Madrid, Spain
| | - J. Martin-Benito
- Department of Macromolecular Structure, National Centre for Biotechnology, Madrid, Spain
| |
Collapse
|
5
|
Galaz-Montoya JG. The advent of preventive high-resolution structural histopathology by artificial-intelligence-powered cryogenic electron tomography. Front Mol Biosci 2024; 11:1390858. [PMID: 38868297 PMCID: PMC11167099 DOI: 10.3389/fmolb.2024.1390858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 05/08/2024] [Indexed: 06/14/2024] Open
Abstract
Advances in cryogenic electron microscopy (cryoEM) single particle analysis have revolutionized structural biology by facilitating the in vitro determination of atomic- and near-atomic-resolution structures for fully hydrated macromolecular complexes exhibiting compositional and conformational heterogeneity across a wide range of sizes. Cryogenic electron tomography (cryoET) and subtomogram averaging are rapidly progressing toward delivering similar insights for macromolecular complexes in situ, without requiring tags or harsh biochemical purification. Furthermore, cryoET enables the visualization of cellular and tissue phenotypes directly at molecular, nanometric resolution without chemical fixation or staining artifacts. This forward-looking review covers recent developments in cryoEM/ET and related technologies such as cryogenic focused ion beam milling scanning electron microscopy and correlative light microscopy, increasingly enhanced and supported by artificial intelligence algorithms. Their potential application to emerging concepts is discussed, primarily the prospect of complementing medical histopathology analysis. Machine learning solutions are poised to address current challenges posed by "big data" in cryoET of tissues, cells, and macromolecules, offering the promise of enabling novel, quantitative insights into disease processes, which may translate into the clinic and lead to improved diagnostics and targeted therapeutics.
Collapse
Affiliation(s)
- Jesús G. Galaz-Montoya
- Department of Bioengineering, James H. Clark Center, Stanford University, Stanford, CA, United States
| |
Collapse
|
6
|
Gyawali R, Dhakal A, Wang L, Cheng J. CryoSegNet: accurate cryo-EM protein particle picking by integrating the foundational AI image segmentation model and attention-gated U-Net. Brief Bioinform 2024; 25:bbae282. [PMID: 38860738 PMCID: PMC11165428 DOI: 10.1093/bib/bbae282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/15/2024] [Accepted: 05/29/2024] [Indexed: 06/12/2024] Open
Abstract
Picking protein particles in cryo-electron microscopy (cryo-EM) micrographs is a crucial step in the cryo-EM-based structure determination. However, existing methods trained on a limited amount of cryo-EM data still cannot accurately pick protein particles from noisy cryo-EM images. The general foundational artificial intelligence-based image segmentation model such as Meta's Segment Anything Model (SAM) cannot segment protein particles well because their training data do not include cryo-EM images. Here, we present a novel approach (CryoSegNet) of integrating an attention-gated U-shape network (U-Net) specially designed and trained for cryo-EM particle picking and the SAM. The U-Net is first trained on a large cryo-EM image dataset and then used to generate input from original cryo-EM images for SAM to make particle pickings. CryoSegNet shows both high precision and recall in segmenting protein particles from cryo-EM micrographs, irrespective of protein type, shape and size. On several independent datasets of various protein types, CryoSegNet outperforms two top machine learning particle pickers crYOLO and Topaz as well as SAM itself. The average resolution of density maps reconstructed from the particles picked by CryoSegNet is 3.33 Å, 7% better than 3.58 Å of Topaz and 14% better than 3.87 Å of crYOLO. It is publicly available at https://github.com/jianlin-cheng/CryoSegNet.
Collapse
Affiliation(s)
- Rajan Gyawali
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States
| | - Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
7
|
Zhao C, Lu D, Zhao Q, Ren C, Zhang H, Zhai J, Gou J, Zhu S, Zhang Y, Gong X. Computational methods for in situ structural studies with cryogenic electron tomography. Front Cell Infect Microbiol 2023; 13:1135013. [PMID: 37868346 PMCID: PMC10586593 DOI: 10.3389/fcimb.2023.1135013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 08/29/2023] [Indexed: 10/24/2023] Open
Abstract
Cryo-electron tomography (cryo-ET) plays a critical role in imaging microorganisms in situ in terms of further analyzing the working mechanisms of viruses and drug exploitation, among others. A data processing workflow for cryo-ET has been developed to reconstruct three-dimensional density maps and further build atomic models from a tilt series of two-dimensional projections. Low signal-to-noise ratio (SNR) and missing wedge are two major factors that make the reconstruction procedure challenging. Because only few near-atomic resolution structures have been reconstructed in cryo-ET, there is still much room to design new approaches to improve universal reconstruction resolutions. This review summarizes classical mathematical models and deep learning methods among general reconstruction steps. Moreover, we also discuss current limitations and prospects. This review can provide software and methods for each step of the entire procedure from tilt series by cryo-ET to 3D atomic structures. In addition, it can also help more experts in various fields comprehend a recent research trend in cryo-ET. Furthermore, we hope that more researchers can collaborate in developing computational methods and mathematical models for high-resolution three-dimensional structures from cryo-ET datasets.
Collapse
Affiliation(s)
- Cuicui Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Da Lu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Qian Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Chongjiao Ren
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Huangtao Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaqi Zhai
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaxin Gou
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Shilin Zhu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Yaqi Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Xinqi Gong
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
- Beijing Academy of Intelligence, Beijing, China
| |
Collapse
|
8
|
Dhakal A, Gyawali R, Wang L, Cheng J. A large expert-curated cryo-EM image dataset for machine learning protein particle picking. Sci Data 2023; 10:392. [PMID: 37349345 PMCID: PMC10287764 DOI: 10.1038/s41597-023-02280-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 05/30/2023] [Indexed: 06/24/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA
| | - Rajan Gyawali
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
9
|
Kim PT, Noble AJ, Cheng A, Bepler T. Learning to automate cryo-electron microscopy data collection with Ptolemy. IUCRJ 2023; 10:90-102. [PMID: 36598505 PMCID: PMC9812219 DOI: 10.1107/s2052252522010612] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 11/03/2022] [Indexed: 06/17/2023]
Abstract
Over the past decade, cryo-electron microscopy (cryoEM) has emerged as an important method for determining near-native, near-atomic resolution 3D structures of biological macromolecules. To meet the increasing demand for cryoEM, automated methods that improve throughput and efficiency of microscope operation are needed. Currently, the targeting algorithms provided by most data-collection software require time-consuming manual tuning of parameters for each grid, and, in some cases, operators must select targets completely manually. However, the development of fully automated targeting algorithms is non-trivial, because images often have low signal-to-noise ratios and optimal targeting strategies depend on a range of experimental parameters and macromolecule behaviors that vary between projects and collection sessions. To address this, Ptolemy provides a pipeline to automate low- and medium-magnification targeting using a suite of purpose-built computer vision and machine-learning algorithms, including mixture models, convolutional neural networks and U-Nets. Learned models in this pipeline are trained on a large set of images from real-world cryoEM data-collection sessions, labeled with locations selected by human operators. These models accurately detect and classify regions of interest in low- and medium-magnification images, and generalize to unseen sessions, as well as to images collected on different microscopes at another facility. This open-source, modular pipeline can be integrated with existing microscope control software to enable automation of cryoEM data collection and can serve as a foundation for future cryoEM automation software.
Collapse
Affiliation(s)
- Paul T. Kim
- Simons Machine Learning Center, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY USA
| | - Alex J. Noble
- Simons Machine Learning Center, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY USA
| | - Anchi Cheng
- Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY USA
| | - Tristan Bepler
- Simons Machine Learning Center, Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY USA
| |
Collapse
|
10
|
Weighted average ensemble-based semantic segmentation in biological electron microscopy images. Histochem Cell Biol 2022; 158:447-462. [DOI: 10.1007/s00418-022-02148-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 12/16/2022]
Abstract
AbstractSemantic segmentation of electron microscopy images using deep learning methods is a valuable tool for the detailed analysis of organelles and cell structures. However, these methods require a large amount of labeled ground truth data that is often unavailable. To address this limitation, we present a weighted average ensemble model that can automatically segment biological structures in electron microscopy images when trained with only a small dataset. Thus, we exploit the fact that a combination of diverse base-learners is able to outperform one single segmentation model. Our experiments with seven different biological electron microscopy datasets demonstrate quantitative and qualitative improvements. We show that the Grad-CAM method can be used to interpret and verify the prediction of our model. Compared with a standard U-Net, the performance of our method is superior for all tested datasets. Furthermore, our model leverages a limited number of labeled training data to segment the electron microscopy images and therefore has a high potential for automated biological applications.
Collapse
|
11
|
Treder KP, Huang C, Kim JS, Kirkland AI. Applications of deep learning in electron microscopy. Microscopy (Oxf) 2022; 71:i100-i115. [DOI: 10.1093/jmicro/dfab043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 08/30/2021] [Accepted: 11/08/2021] [Indexed: 12/25/2022] Open
Abstract
Abstract
We review the growing use of machine learning in electron microscopy (EM) driven in part by the availability of fast detectors operating at kiloHertz frame rates leading to large data sets that cannot be processed using manually implemented algorithms. We summarize the various network architectures and error metrics that have been applied to a range of EM-related problems including denoising and inpainting. We then provide a review of the application of these in both physical and life sciences, highlighting how conventional networks and training data have been specifically modified for EM.
Collapse
Affiliation(s)
- Kevin P Treder
- Department of Materials, University of Oxford, Oxford, Oxfordshire OX1 3PH, UK
| | - Chen Huang
- Rosalind Franklin Institute, Harwell Research Campus, Didcot, Oxfordshire OX11 0FA, UK
| | - Judy S Kim
- Department of Materials, University of Oxford, Oxford, Oxfordshire OX1 3PH, UK
- Rosalind Franklin Institute, Harwell Research Campus, Didcot, Oxfordshire OX11 0FA, UK
| | - Angus I Kirkland
- Department of Materials, University of Oxford, Oxford, Oxfordshire OX1 3PH, UK
- Rosalind Franklin Institute, Harwell Research Campus, Didcot, Oxfordshire OX11 0FA, UK
| |
Collapse
|
12
|
Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021; 22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Dukka B. KC
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| |
Collapse
|
13
|
An overview of the recent advances in cryo-electron microscopy for life sciences. Emerg Top Life Sci 2021; 5:151-168. [PMID: 33760078 DOI: 10.1042/etls20200295] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/26/2021] [Accepted: 03/09/2021] [Indexed: 01/18/2023]
Abstract
Cryo-electron microscopy (CryoEM) has superseded X-ray crystallography and NMR to emerge as a popular and effective tool for structure determination in recent times. It has become indispensable for the characterization of large macromolecular assemblies, membrane proteins, or samples that are limited, conformationally heterogeneous, and recalcitrant to crystallization. Besides, it is the only tool capable of elucidating high-resolution structures of macromolecules and biological assemblies in situ. A state-of-the-art electron microscope operable at cryo-temperature helps preserve high-resolution details of the biological sample. The structures can be determined, either in isolation via single-particle analysis (SPA) or helical reconstruction, electron diffraction (ED) or within the cellular environment via cryo-electron tomography (cryoET). All the three streams of SPA, ED, and cryoET (along with subtomogram averaging) have undergone significant advancements in recent times. This has resulted in breaking the boundaries with respect to both the size of the macromolecules/assemblies whose structures could be determined along with the visualization of atomic details at resolutions unprecedented for cryoEM. In addition, the collection of larger datasets combined with the ability to sort and process multiple conformational states from the same sample are providing the much-needed link between the protein structures and their functions. In overview, these developments are helping scientists decipher the molecular mechanism of critical cellular processes, solve structures of macromolecules that were challenging targets for structure determination until now, propelling forward the fields of biology and biomedicine. Here, we summarize recent advances and key contributions of the three cryo-electron microscopy streams of SPA, ED, and cryoET.
Collapse
|