1
|
Dhakal A, Gyawali R, Wang L, Cheng J. CryoTransformer: a transformer model for picking protein particles from cryo-EM micrographs. Bioinformatics 2024; 40:btae109. [PMID: 38407301 PMCID: PMC10937899 DOI: 10.1093/bioinformatics/btae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/28/2024] [Accepted: 02/22/2024] [Indexed: 02/27/2024] Open
Abstract
MOTIVATION Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of large protein complexes. Picking single protein particles from cryo-EM micrographs (images) is a crucial step in reconstructing protein structures from them. However, the widely used template-based particle picking process requires some manual particle picking and is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) can potentially automate particle picking, the current AI methods pick particles with low precision or low recall. The erroneously picked particles can severely reduce the quality of reconstructed protein structures, especially for the micrographs with low signal-to-noise ratio. RESULTS To address these shortcomings, we devised CryoTransformer based on transformers, residual networks, and image processing techniques to accurately pick protein particles from cryo-EM micrographs. CryoTransformer was trained and tested on the largest labeled cryo-EM protein particle dataset-CryoPPP. It outperforms the current state-of-the-art machine learning methods of particle picking in terms of the resolution of 3D density maps reconstructed from the picked particles as well as F1-score, and is poised to facilitate the automation of the cryo-EM protein particle picking. AVAILABILITY AND IMPLEMENTATION The source code and data for CryoTransformer are openly available at: https://github.com/jianlin-cheng/CryoTransformer.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States
| | - Rajan Gyawali
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
2
|
Dhakal A, Gyawali R, Wang L, Cheng J. CryoTransformer: A Transformer Model for Picking Protein Particles from Cryo-EM Micrographs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.19.563155. [PMID: 37961171 PMCID: PMC10634673 DOI: 10.1101/2023.10.19.563155] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of large protein complexes. Picking single protein particles from cryo-EM micrographs (images) is a crucial step in reconstructing protein structures from them. However, the widely used template-based particle picking process requires some manual particle picking and is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) can potentially automate particle picking, the current AI methods pick particles with low precision or low recall. The erroneously picked particles can severely reduce the quality of reconstructed protein structures, especially for the micrographs with low signal-to-noise (SNR) ratios. To address these shortcomings, we devised CryoTransformer based on transformers, residual networks, and image processing techniques to accurately pick protein particles from cryo-EM micrographs. CryoTransformer was trained and tested on the largest labelled cryo-EM protein particle dataset - CryoPPP. It outperforms the current state-of-the-art machine learning methods of particle picking in terms of the resolution of 3D density maps reconstructed from the picked particles as well as F1-score and is poised to facilitate the automation of the cryo-EM protein particle picking.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health, University of Missouri, Columbia, Columbia, MO 65211, USA
| | - Rajan Gyawali
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health, University of Missouri, Columbia, Columbia, MO 65211, USA
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health, University of Missouri, Columbia, Columbia, MO 65211, USA
| |
Collapse
|
3
|
Dhakal A, Gyawali R, Wang L, Cheng J. A large expert-curated cryo-EM image dataset for machine learning protein particle picking. Sci Data 2023; 10:392. [PMID: 37349345 PMCID: PMC10287764 DOI: 10.1038/s41597-023-02280-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 05/30/2023] [Indexed: 06/24/2023] Open
Abstract
Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structures of biological macromolecular complexes. Picking single-protein particles from cryo-EM micrographs is a crucial step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though machine learning and artificial intelligence (AI) based particle picking can potentially automate the process, its development is hindered by lack of large, high-quality labelled training data. To address this bottleneck, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for protein particle picking and analysis. It consists of labelled cryo-EM micrographs (images) of 34 representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). The dataset is 2.6 terabytes and includes 9,893 high-resolution micrographs with labelled protein particle coordinates. The labelling process was rigorously validated through 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of both AI and classical methods for automated cryo-EM protein particle picking.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA
| | - Rajan Gyawali
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
4
|
Dhakal A, Gyawali R, Wang L, Cheng J. CryoPPP: A Large Expert-Labelled Cryo-EM Image Dataset for Machine Learning Protein Particle Picking. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.21.529443. [PMID: 36865277 PMCID: PMC9980126 DOI: 10.1101/2023.02.21.529443] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Cryo-electron microscopy (cryo-EM) is currently the most powerful technique for determining the structures of large protein complexes and assemblies. Picking single-protein particles from cryo-EM micrographs (images) is a key step in reconstructing protein structures. However, the widely used template-based particle picking process is labor-intensive and time-consuming. Though the emerging machine learning-based particle picking can potentially automate the process, its development is severely hindered by lack of large, high-quality, manually labelled training data. Here, we present CryoPPP, a large, diverse, expert-curated cryo-EM image dataset for single protein particle picking and analysis to address this bottleneck. It consists of manually labelled cryo-EM micrographs of 32 non-redundant, representative protein datasets selected from the Electron Microscopy Public Image Archive (EMPIAR). It includes 9,089 diverse, high-resolution micrographs (∼300 cryo-EM images per EMPIAR dataset) in which the coordinates of protein particles were labelled by human experts. The protein particle labelling process was rigorously validated by both 2D particle class validation and 3D density map validation with the gold standard. The dataset is expected to greatly facilitate the development of machine learning and artificial intelligence methods for automated cryo-EM protein particle picking. The dataset and data processing scripts are available at https://github.com/BioinfoMachineLearning/cryoppp.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA. Fax: 573-882-8318
| | - Rajan Gyawali
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA. Fax: 573-882-8318
| | - Liguo Wang
- Laboratory for BioMolecular Structure (LBMS), Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA. Fax: 573-882-8318
| |
Collapse
|
5
|
Wu JG, Yan Y, Zhang DX, Liu BW, Zheng QB, Xie XL, Liu SQ, Ge SX, Hou ZG, Xia NS. Machine Learning for Structure Determination in Single-Particle Cryo-Electron Microscopy: A Systematic Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:452-472. [PMID: 34932487 DOI: 10.1109/tnnls.2021.3131325] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently, single-particle cryo-electron microscopy (cryo-EM) has become an indispensable method for determining macromolecular structures at high resolution to deeply explore the relevant molecular mechanism. Its recent breakthrough is mainly because of the rapid advances in hardware and image processing algorithms, especially machine learning. As an essential support of single-particle cryo-EM, machine learning has powered many aspects of structure determination and greatly promoted its development. In this article, we provide a systematic review of the applications of machine learning in this field. Our review begins with a brief introduction of single-particle cryo-EM, followed by the specific tasks and challenges of its image processing. Then, focusing on the workflow of structure determination, we describe relevant machine learning algorithms and applications at different steps, including particle picking, 2-D clustering, 3-D reconstruction, and other steps. As different tasks exhibit distinct characteristics, we introduce the evaluation metrics for each task and summarize their dynamics of technology development. Finally, we discuss the open issues and potential trends in this promising field.
Collapse
|
6
|
Al-Azzawi A, Ouadou A, Max H, Duan Y, Tanner JJ, Cheng J. DeepCryoPicker: fully automated deep neural network for single protein particle picking in cryo-EM. BMC Bioinformatics 2020; 21:509. [PMID: 33167860 PMCID: PMC7653784 DOI: 10.1186/s12859-020-03809-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 10/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cryo-electron microscopy (Cryo-EM) is widely used in the determination of the three-dimensional (3D) structures of macromolecules. Particle picking from 2D micrographs remains a challenging early step in the Cryo-EM pipeline due to the diversity of particle shapes and the extremely low signal-to-noise ratio of micrographs. Because of these issues, significant human intervention is often required to generate a high-quality set of particles for input to the downstream structure determination steps. RESULTS Here we propose a fully automated approach (DeepCryoPicker) for single particle picking based on deep learning. It first uses automated unsupervised learning to generate particle training datasets. Then it trains a deep neural network to classify particles automatically. Results indicate that the DeepCryoPicker compares favorably with semi-automated methods such as DeepEM, DeepPicker, and RELION, with the significant advantage of not requiring human intervention. CONCLUSIONS Our framework combing supervised deep learning classification with automated un-supervised clustering for generating training data provides an effective approach to pick particles in cryo-EM images automatically and accurately.
Collapse
Affiliation(s)
- Adil Al-Azzawi
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO 65211 USA
| | - Anes Ouadou
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO 65211 USA
| | - Highsmith Max
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO 65211 USA
| | - Ye Duan
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO 65211 USA
| | - John J. Tanner
- Departments of Biochemistry and Chemistry, University of Missouri, Columbia, MO 65211-2060 USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO 65211 USA
- Informatics Institute, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
7
|
Shi J, Zeng X, Jiang R, Jiang T, Xu M. A simulated annealing approach for resolution guided homogeneous cryo-electron microscopy image selection. QUANTITATIVE BIOLOGY 2020; 8:51-63. [PMID: 32477613 PMCID: PMC7259590 DOI: 10.1007/s40484-019-0191-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 09/10/2019] [Accepted: 11/08/2019] [Indexed: 10/24/2022]
Abstract
BACKGROUND Cryo-electron microscopy (Cryo-EM) and tomography (Cryo-ET) have emerged as important imaging techniques for studying structures of macromolecular complexes. In 3D reconstruction of large macromolecular complexes, many 2D projection images of macromolecular complex particles are usually acquired with low signal-to-noise ratio. Therefore, it is meaningful to select multiple images containing the same structure with identical orientation. The selected images are averaged to produce a higher-quality representation of the underlying structure with improved resolution. Existing approaches of selecting such images have limited accuracy and speed. METHODS We propose a simulated annealing-based algorithm (SA) to pick the homogeneous image set with best average. Its performance is compared with two baseline methods based on both 2D and 3D datasets. When tested on simulated and experimental 3D Cryo-ET images of Ribosome complex, SA sometimes stopped at a local optimal solution. Restarting is applied to settle this difficulty and significantly improved the performance of SA on 3D datasets. RESULTS Experimented on simulated and experimental 2D Cryo-EM images of Ribosome complex datasets respectively with SNR = 10 and SNR = 0.5, our method achieved better accuracy in terms of F-measure, resolution score, and time cost than two baseline methods. Additionally, SA shows its superiority when the proportion of homogeneous images decreases. CONCLUSIONS SA is introduced for homogeneous image selection to realize higher accuracy with faster processing speed. Experiments on both simulated and real 2D Cryo-EM and 3D Cryo-ET images demonstrated that SA achieved expressively better performance. This approach serves as an important step for improving the resolution of structural recovery of macromolecular complexes captured by Cryo-EM and Cryo-ET.
Collapse
Affiliation(s)
- Jie Shi
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Rui Jiang
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California-Riverside, Riverside, CA 92521, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
8
|
Wang WL, Yu Z, Castillo-Menendez LR, Sodroski J, Mao Y. Robustness of signal detection in cryo-electron microscopy via a bi-objective-function approach. BMC Bioinformatics 2019; 20:169. [PMID: 30943890 PMCID: PMC6446299 DOI: 10.1186/s12859-019-2714-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 03/04/2019] [Indexed: 12/22/2022] Open
Abstract
Background The detection of weak signals and selection of single particles from low-contrast micrographs of frozen hydrated biomolecules by cryo-electron microscopy (cryo-EM) represents a major practical bottleneck in cryo-EM data analysis. Template-based particle picking by an objective function using fast local correlation (FLC) allows computational extraction of a large number of candidate particles from micrographs. Another independent objective function based on maximum likelihood estimates (MLE) can be used to align the images and verify the presence of a signal in the selected particles. Despite the widespread applications of the two objective functions, an optimal combination of their utilities has not been exploited. Here we propose a bi-objective function (BOF) approach that combines both FLC and MLE and explore the potential advantages and limitations of BOF in signal detection from cryo-EM data. Results The robustness of the BOF strategy in particle selection and verification was systematically examined with both simulated and experimental cryo-EM data. We investigated how the performance of the BOF approach is quantitatively affected by the signal-to-noise ratio (SNR) of cryo-EM data and by the choice of initialization for FLC and MLE. We quantitatively pinpointed the critical SNR (~ 0.005), at which the BOF approach starts losing its ability to select and verify particles reliably. We found that the use of a Gaussian model to initialize the MLE suppresses the adverse effects of reference dependency in the FLC function used for template-matching. Conclusion The BOF approach, which combines two distinct objective functions, provides a sensitive way to verify particles for downstream cryo-EM structure analysis. Importantly, reference dependency of the FLC does not necessarily transfer to the MLE, enabling the robust detection of weak signals. Our insights into the numerical behavior of the BOF approach can be used to improve automation efficiency in the cryo-EM data processing pipeline for high-resolution structural determination. Electronic supplementary material The online version of this article (10.1186/s12859-019-2714-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Li Wang
- Intel® Parallel Computing Center for Structural Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.,Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Department of Microbiology, Harvard Medical School, Boston, MA, 02115, USA.,State Key Laboratory of Artificial Microstructures and Mesoscopic Physics, School of Physics, Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Zhou Yu
- Graduate School of Arts and Sciences, Department of Cellular and Molecular Biology, Harvard University, Cambridge, MA, 02138, USA
| | - Luis R Castillo-Menendez
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Department of Microbiology, Harvard Medical School, Boston, MA, 02115, USA
| | - Joseph Sodroski
- Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Department of Microbiology, Harvard Medical School, Boston, MA, 02115, USA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Youdong Mao
- Intel® Parallel Computing Center for Structural Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA. .,Department of Cancer Immunology and Virology, Dana-Farber Cancer Institute, Department of Microbiology, Harvard Medical School, Boston, MA, 02115, USA. .,State Key Laboratory of Artificial Microstructures and Mesoscopic Physics, School of Physics, Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| |
Collapse
|
9
|
Ali RA, Mehdi AM, Rothnagel R, Hamilton NA, Gerle C, Landsberg MJ, Hankamer B. RAZA: A Rapid 3D z-crossings algorithm to segment electron tomograms and extract organelles and macromolecules. J Struct Biol 2017; 200:73-86. [PMID: 29032142 DOI: 10.1016/j.jsb.2017.10.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 10/06/2017] [Accepted: 10/09/2017] [Indexed: 11/30/2022]
Abstract
Resolving the 3D architecture of cells to atomic resolution is one of the most ambitious challenges of cellular and structural biology. Central to this process is the ability to automate tomogram segmentation to identify sub-cellular components, facilitate molecular docking and annotate detected objects with associated metadata. Here we demonstrate that RAZA (Rapid 3D z-crossings algorithm) provides a robust, accurate, intuitive, fast, and generally applicable segmentation algorithm capable of detecting organelles, membranes, macromolecular assemblies and extrinsic membrane protein domains. RAZA defines each continuous contour within a tomogram as a discrete object and extracts a set of 3D structural fingerprints (major, middle and minor axes, surface area and volume), enabling selective, semi-automated segmentation and object extraction. RAZA takes advantage of the fact that the underlying algorithm is a true 3D edge detector, allowing the axes of a detected object to be defined, independent of its random orientation within a cellular tomogram. The selectivity of object segmentation and extraction can be controlled by specifying a user-defined detection tolerance threshold for each fingerprint parameter, within which segmented objects must fall and/or by altering the number of search parameters, to define morphologically similar structures. We demonstrate the capability of RAZA to selectively extract subgroups of organelles (mitochondria) and macromolecular assemblies (ribosomes) from cellular tomograms. Furthermore, the ability of RAZA to define objects and their contours, provides a basis for molecular docking and rapid tomogram annotation.
Collapse
Affiliation(s)
- Rubbiya A Ali
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Ahmed M Mehdi
- Translational Research Institute, University of Queensland Diamantina Institute, Brisbane, QLD, Australia; Department of Electrical Engineering, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Rosalba Rothnagel
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Nicholas A Hamilton
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Christoph Gerle
- Picobiology Institute, Department of Life Science, Graduate School of Life Science, University of Hyogo, Kamigori, Japan; Core Research for Evolutional Science and Technology, Japan Science and Technology Agency, Kawaguchi, Japan
| | - Michael J Landsberg
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia; School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Ben Hankamer
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
10
|
Zhu Y, Ouyang Q, Mao Y. A deep convolutional neural network approach to single-particle recognition in cryo-electron microscopy. BMC Bioinformatics 2017; 18:348. [PMID: 28732461 PMCID: PMC5521087 DOI: 10.1186/s12859-017-1757-y] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 07/13/2017] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Single-particle cryo-electron microscopy (cryo-EM) has become a mainstream tool for the structural determination of biological macromolecular complexes. However, high-resolution cryo-EM reconstruction often requires hundreds of thousands of single-particle images. Particle extraction from experimental micrographs thus can be laborious and presents a major practical bottleneck in cryo-EM structural determination. Existing computational methods for particle picking often use low-resolution templates for particle matching, making them susceptible to reference-dependent bias. It is critical to develop a highly efficient template-free method for the automatic recognition of particle images from cryo-EM micrographs. RESULTS We developed a deep learning-based algorithmic framework, DeepEM, for single-particle recognition from noisy cryo-EM micrographs, enabling automated particle picking, selection and verification in an integrated fashion. The kernel of DeepEM is built upon a convolutional neural network (CNN) composed of eight layers, which can be recursively trained to be highly "knowledgeable". Our approach exhibits an improved performance and accuracy when tested on the standard KLH dataset. Application of DeepEM to several challenging experimental cryo-EM datasets demonstrated its ability to avoid the selection of un-wanted particles and non-particles even when true particles contain fewer features. CONCLUSIONS The DeepEM methodology, derived from a deep CNN, allows automated particle extraction from raw cryo-EM micrographs in the absence of a template. It demonstrates an improved performance, objectivity and accuracy. Application of this novel method is expected to free the labor involved in single-particle verification, significantly improving the efficiency of cryo-EM data processing.
Collapse
Affiliation(s)
- Yanan Zhu
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Qi Ouyang
- Center for Quantitative Biology, Peking University, Beijing, 100871, China.,State Key Laboratory for Artificial Microstructure and Mesoscopic Physics, Peking University, Institute of Condensed Matter Physics, School of Physics, Beijing, 100871, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Youdong Mao
- Center for Quantitative Biology, Peking University, Beijing, 100871, China. .,State Key Laboratory for Artificial Microstructure and Mesoscopic Physics, Peking University, Institute of Condensed Matter Physics, School of Physics, Beijing, 100871, China. .,Intel Parallel Computing Center for Structural Biology, Department of Microbiology and Immunobiology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
11
|
Zhang F, Chen Y, Ren F, Wang X, Liu Z, Wan X. A Two-Phase Improved Correlation Method for Automatic Particle Selection in Cryo-EM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:316-325. [PMID: 28368809 DOI: 10.1109/tcbb.2015.2415787] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Particle selection from cryo-electron microscopy (Cryo-EM) images is very important for high-resolution reconstruction of macromolecular structure. The methods of particle selection can be roughly grouped into two classes, template-matching methods and feature-based methods. In general, template-matching methods usually generate better results than feature-based methods. However, the accuracy of template-matching methods is restricted by the noise and low contrast of Cryo-EM images. Moreover, the processing speed of template-matching methods, restricted by the random orientation of particles, further limits their practical applications. In this paper, combining the advantages of feature-based methods and template-matching methods, we present a two-phase improved correlation method for automatic, fast particle selection. In Phase I, we generate a preliminary particle set using rotation-invariant features of particles. In Phase II, we filter the preliminary particle set using a correlation method to reduce the interference of the high noise background and improve the precision of particle selection. We apply several optimization strategies, including a modified adaboost algorithm, Divide and Conquer technique, cascade strategy and graphics processing unit parallel technique, to improve feature recognition ability and reduce processing time. In addition, we developed two correlation score functions for different correlation situations. Experimental results on the benchmark of Cryo-EM images show that our method can improve the accuracy and processing speed of particle selection significantly.
Collapse
|
12
|
Transfer Learning for the Recognition of Immunogold Particles in TEM Imaging. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2015. [DOI: 10.1007/978-3-319-19258-1_32] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
13
|
Hoang TV, Cavin X, Schultz P, Ritchie DW. gEMpicker: a highly parallel GPU-accelerated particle picking tool for cryo-electron microscopy. BMC STRUCTURAL BIOLOGY 2013; 13:25. [PMID: 24144335 PMCID: PMC3942177 DOI: 10.1186/1472-6807-13-25] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 10/14/2013] [Indexed: 11/25/2022]
Abstract
Background Picking images of particles in cryo-electron micrographs is an important step in solving the 3D structures of large macromolecular assemblies. However, in order to achieve sub-nanometre resolution it is often necessary to capture and process many thousands or even several millions of 2D particle images. Thus, a computational bottleneck in reaching high resolution is the accurate and automatic picking of particles from raw cryo-electron micrographs. Results We have developed “gEMpicker”, a highly parallel correlation-based particle picking tool. To our knowledge, gEMpicker is the first particle picking program to use multiple graphics processor units (GPUs) to accelerate the calculation. When tested on the publicly available keyhole limpet hemocyanin dataset, we find that gEMpicker gives similar results to the FindEM program. However, compared to calculating correlations on one core of a contemporary central processor unit (CPU), running gEMpicker on a modern GPU gives a speed-up of about 27 ×. To achieve even higher processing speeds, the basic correlation calculations are accelerated considerably by using a hierarchy of parallel programming techniques to distribute the calculation over multiple GPUs and CPU cores attached to multiple nodes of a computer cluster. By using a theoretically optimal reduction algorithm to collect and combine the cluster calculation results, the speed of the overall calculation scales almost linearly with the number of cluster nodes available. Conclusions The very high picking throughput that is now possible using GPU-powered workstations or computer clusters will help experimentalists to achieve higher resolution 3D reconstructions more rapidly than before.
Collapse
Affiliation(s)
- Thai V Hoang
- Inria Nancy - Grand Est, 615 rue du Jardin Botanique, 54600 Villers-lès-Nancy, France.
| | | | | | | |
Collapse
|
14
|
Abrishami V, Zaldívar-Peraza A, de la Rosa-Trevín JM, Vargas J, Otón J, Marabini R, Shkolnisky Y, Carazo JM, Sorzano COS. A pattern matching approach to the automatic selection of particles from low-contrast electron micrographs. Bioinformatics 2013; 29:2460-8. [DOI: 10.1093/bioinformatics/btt429] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
15
|
Particle quality assessment and sorting for automatic and semiautomatic particle-picking techniques. J Struct Biol 2013; 183:342-353. [PMID: 23933392 DOI: 10.1016/j.jsb.2013.07.015] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 07/10/2013] [Accepted: 07/31/2013] [Indexed: 11/22/2022]
Abstract
Three-dimensional reconstruction of biological specimens using electron microscopy by single particle methodologies requires the identification and extraction of the imaged particles from the acquired micrographs. Automatic and semiautomatic particle selection approaches can localize these particles, minimizing the user interaction, but at the cost of selecting a non-negligible number of incorrect particles, which can corrupt the final three-dimensional reconstruction. In this work, we present a novel particle quality assessment and sorting method that can separate most erroneously picked particles from correct ones. The proposed method is based on multivariate statistical analysis of a particle set that has been picked previously using any automatic or manual approach. The new method uses different sets of particle descriptors, which are morphology-based, histogram-based and signal to noise analysis based. We have tested our proposed algorithm with experimental data obtaining very satisfactory results. The algorithm is freely available as a part of the Xmipp 3.0 package [http://xmipp.cnb.csic.es].
Collapse
|
16
|
Automatic post-picking using MAPPOS improves particle image detection from cryo-EM micrographs. J Struct Biol 2013; 182:59-66. [DOI: 10.1016/j.jsb.2013.02.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Revised: 01/22/2013] [Accepted: 02/11/2013] [Indexed: 11/24/2022]
|
17
|
Zhao J, Brubaker MA, Rubinstein JL. TMaCS: a hybrid template matching and classification system for partially-automated particle selection. J Struct Biol 2013; 181:234-42. [PMID: 23333657 DOI: 10.1016/j.jsb.2012.12.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 12/04/2012] [Accepted: 12/21/2012] [Indexed: 10/27/2022]
Abstract
Selection of particle images from electron micrographs presents a bottleneck in determining the structures of macromolecular assemblies by single particle electron cryomicroscopy (cryo-EM). The problem is particularly important when an experimentalist wants to improve the resolution of a 3D map by increasing by tens or hundreds of thousands of images the size of the dataset used for calculating the map. Although several existing methods for automatic particle image selection work well for large protein complexes that produce high-contrast images, it is well known in the cryo-EM community that small complexes that give low-contrast images are often refractory to existing automated particle image selection schemes. Here we develop a method for partially-automated particle image selection when an initial 3D map of the protein under investigation is already available. Candidate particle images are selected from micrographs by template matching with template images derived from projections of the existing 3D map. The candidate particle images are then used to train a support vector machine, which classifies the candidates as particle images or non-particle images. In a final step in the analysis, the selected particle images are subjected to projection matching against the initial 3D map, with the correlation coefficient between the particle image and the best matching map projection used to assess the reliability of the particle image. We show that this approach is able to rapidly select particle images from micrographs of a rotary ATPase, a type of membrane protein complex involved in many aspects of biology.
Collapse
Affiliation(s)
- Jianhua Zhao
- Molecular Structure and Function Program, The Hospital for Sick Children Research Institute, Toronto, Ontario, Canada
| | | | | |
Collapse
|
18
|
Ali RA, Landsberg MJ, Knauth E, Morgan GP, Marsh BJ, Hankamer B. A 3D image filter for parameter-free segmentation of macromolecular structures from electron tomograms. PLoS One 2012; 7:e33697. [PMID: 22479430 PMCID: PMC3315577 DOI: 10.1371/journal.pone.0033697] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2012] [Accepted: 02/16/2012] [Indexed: 11/18/2022] Open
Abstract
3D image reconstruction of large cellular volumes by electron tomography (ET) at high (≤ 5 nm) resolution can now routinely resolve organellar and compartmental membrane structures, protein coats, cytoskeletal filaments, and macromolecules. However, current image analysis methods for identifying in situ macromolecular structures within the crowded 3D ultrastructural landscape of a cell remain labor-intensive, time-consuming, and prone to user-bias and/or error. This paper demonstrates the development and application of a parameter-free, 3D implementation of the bilateral edge-detection (BLE) algorithm for the rapid and accurate segmentation of cellular tomograms. The performance of the 3D BLE filter has been tested on a range of synthetic and real biological data sets and validated against current leading filters-the pseudo 3D recursive and Canny filters. The performance of the 3D BLE filter was found to be comparable to or better than that of both the 3D recursive and Canny filters while offering the significant advantage that it requires no parameter input or optimisation. Edge widths as little as 2 pixels are reproducibly detected with signal intensity and grey scale values as low as 0.72% above the mean of the background noise. The 3D BLE thus provides an efficient method for the automated segmentation of complex cellular structures across multiple scales for further downstream processing, such as cellular annotation and sub-tomogram averaging, and provides a valuable tool for the accurate and high-throughput identification and annotation of 3D structural complexity at the subcellular level, as well as for mapping the spatial and temporal rearrangement of macromolecular assemblies in situ within cellular tomograms.
Collapse
Affiliation(s)
| | | | | | | | | | - Ben Hankamer
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, Australia
- * E-mail:
| |
Collapse
|
19
|
Langlois R, Pallesen J, Frank J. Reference-free particle selection enhanced with semi-supervised machine learning for cryo-electron microscopy. J Struct Biol 2011; 175:353-61. [PMID: 21708269 DOI: 10.1016/j.jsb.2011.06.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2011] [Revised: 06/03/2011] [Accepted: 06/11/2011] [Indexed: 10/18/2022]
Abstract
Reference-based methods have dominated the approaches to the particle selection problem, proving fast, and accurate on even the most challenging micrographs. A reference volume, however, is not always available and compiling a set of reference projections from the micrographs themselves requires significant effort to attain the same level of accuracy. We propose a reference-free method to quickly extract particles from the micrograph. The method is augmented with a new semi-supervised machine-learning algorithm to accurately discriminate particles from contaminants and noise.
Collapse
Affiliation(s)
- Robert Langlois
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | | | | |
Collapse
|
20
|
A clarification of the terms used in comparing semi-automated particle selection algorithms in cryo-EM. J Struct Biol 2011; 175:348-52. [PMID: 21420497 DOI: 10.1016/j.jsb.2011.03.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 03/02/2011] [Accepted: 03/12/2011] [Indexed: 11/22/2022]
Abstract
Many cyro-EM datasets are heterogeneous stemming from molecules undergoing conformational changes. The need to characterize each of the substrates with sufficient resolution entails a large increase in the data flow and motivates the development of more effective automated particle selection algorithms. Concepts and procedures from the machine-learning field are increasingly employed toward this end. However, a review of recent literature has revealed a discrepancy in terminology of the performance scores used to compare particle selection algorithms, and this has subsequently led to ambiguities in the meaning of claimed performance. In an attempt to curtail the perpetuation of this confusion and to disentangle past mistakes, we review the performance of published particle selection efforts with a set of explicitly defined performance scores using the terminology established and accepted within the field of machine learning.
Collapse
|
21
|
Lyumkis D, Moeller A, Cheng A, Herold A, Hou E, Irving C, Jacovetty EL, Lau PW, Mulder AM, Pulokas J, Quispe JD, Voss NR, Potter CS, Carragher B. Automation in single-particle electron microscopy connecting the pieces. Methods Enzymol 2010; 483:291-338. [PMID: 20888480 DOI: 10.1016/s0076-6879(10)83015-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Throughout the history of single-particle electron microscopy (EM), automated technologies have seen varying degrees of emphasis and development, usually depending upon the contemporary demands of the field. We are currently faced with increasingly sophisticated devices for specimen preparation, vast increases in the size of collected data sets, comprehensive algorithms for image processing, sophisticated tools for quality assessment, and an influx of interested scientists from outside the field who might lack the skills of experienced microscopists. This situation places automated techniques in high demand. In this chapter, we provide a generic definition of and discuss some of the most important advances in automated approaches to specimen preparation, grid handling, robotic screening, microscope calibrations, data acquisition, image processing, and computational infrastructure. Each section describes the general problem and then provides examples of how that problem has been addressed through automation, highlighting available processing packages, and sometimes describing the particular approach at the National Resource for Automated Molecular Microscopy (NRAMM). We contrast the more familiar manual procedures with automated approaches, emphasizing breakthroughs as well as current limitations. Finally, we speculate on future directions and improvements in automated technologies. Our overall goal is to present automation as more than simply a tool to save time. Rather, we aim to illustrate that automation is a comprehensive and versatile strategy that can deliver biological information on an unprecedented scale beyond the scope available with classical manual approaches.
Collapse
Affiliation(s)
- Dmitry Lyumkis
- National Resource for Automated Molecular Microscopy, Department of Cell Biology, The Scripps Research Institute, La Jolla, California, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
DoG Picker and TiltPicker: software tools to facilitate particle selection in single particle electron microscopy. J Struct Biol 2009; 166:205-13. [PMID: 19374019 DOI: 10.1016/j.jsb.2009.01.004] [Citation(s) in RCA: 443] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Solving the structure of macromolecular complexes using transmission electron microscopy can be an arduous task. Many of the steps in this process rely strongly on the aid of pre-existing structural knowledge, and are greatly complicated when this information is unavailable. Here, we present two software tools meant to facilitate particle picking, an early stage in the single-particle processing of unknown macromolecules. The first tool, DoG Picker, is an efficient and reasonably general, particle picker based on the Difference of Gaussians (DoG) image transform. It can function alone, as a reference-free particle picker with the unique ability to sort particles based on size, or it can also be used as a way to bootstrap the creation of templates or training datasets for other particle pickers. The second tool is TiltPicker, an interactive graphical interface application designed to streamline the selection of particle pairs from tilted-pair datasets. In many respects, TiltPicker is a re-implementation of the SPIDER WEB tilted-particle picker, but built on modern computer frameworks making it easier to deploy and maintain. The TiltPicker program also includes several useful new features beyond those of its predecessor.
Collapse
|
23
|
Sorzano COS, Recarte E, Alcorlo M, Bilbao-Castro JR, San-Martín C, Marabini R, Carazo JM. Automatic particle selection from electron micrographs using machine learning techniques. J Struct Biol 2009; 167:252-60. [PMID: 19555764 DOI: 10.1016/j.jsb.2009.06.011] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 06/15/2009] [Accepted: 06/16/2009] [Indexed: 10/20/2022]
Abstract
The 3D reconstruction of biological specimens using Electron Microscopy is currently capable of achieving subnanometer resolution. Unfortunately, this goal requires gathering tens of thousands of projection images that are frequently selected manually from micrographs. In this paper we introduce a new automatic particle selection that learns from the user which particles are of interest. The training phase is semi-supervised so that the user can correct the algorithm during picking and specifically identify incorrectly picked particles. By treating such errors specially, the algorithm attempts to minimize the number of false positives. We show that our algorithm is able to produce datasets with fewer wrongly selected particles than previously reported methods. Another advantage is that we avoid the need for an initial reference volume from which to generate picking projections by instead learning which particles to pick from the user. This package has been made publicly available in the open-source package Xmipp.
Collapse
Affiliation(s)
- C O S Sorzano
- Unidad de Biocomputación, Centro Nacional de Biotecnología (CSIC), Campus Universidad Autónoma s/n, 28049 Cantoblanco, Madrid, Spain.
| | | | | | | | | | | | | |
Collapse
|
24
|
Cheng A, Leung A, Fellmann D, Quispe J, Suloway C, Pulokas J, Abeyrathne PD, Lam JS, Carragher B, Potter CS. Towards automated screening of two-dimensional crystals. J Struct Biol 2007; 160:324-31. [PMID: 17977016 DOI: 10.1016/j.jsb.2007.09.012] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Revised: 09/10/2007] [Accepted: 09/11/2007] [Indexed: 12/01/2022]
Abstract
Screening trials to determine the presence of two-dimensional (2D) protein crystals suitable for three-dimensional structure determination using electron crystallography is a very labor-intensive process. Methods compatible with fully automated screening have been developed for the process of crystal production by dialysis and for producing negatively stained grids of the resulting trials. Further automation via robotic handling of the EM grids, and semi-automated transmission electron microscopic imaging and evaluation of the trial grids is also possible. We, and others, have developed working prototypes for several of these tools and tested and evaluated them in a simple screen of 24 crystallization conditions. While further development of these tools is certainly required for a turn-key system, the goal of fully automated screening appears to be within reach.
Collapse
Affiliation(s)
- Anchi Cheng
- The National Resource for Automated Molecular Microscopy, Department of Cell Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, CB-129, La Jolla, CA 92037, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Pantelic RS, Ericksson G, Hamilton N, Hankamer B. Bilateral edge filter: photometrically weighted, discontinuity based edge detection. J Struct Biol 2007; 160:93-102. [PMID: 17822922 DOI: 10.1016/j.jsb.2007.07.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2006] [Revised: 07/12/2007] [Accepted: 07/17/2007] [Indexed: 11/28/2022]
Abstract
Edge-detection algorithms have the potential to play an increasingly important role both in single particle analysis (for the detection of randomly oriented particles), and in tomography (for the segmentation of 3D volumes). However, the majority of traditional linear filters are significantly affected by noise as well as artefacts, and offer limited selectivity. The Bilateral edge filter presented here is an adaptation of the Bilateral filter [Jiang, W., Baker, M.L., Wu, Q., Bajaj, C., Chiu, W., 2003. Applications of a bilateral denoising filter in biological electron microscopy. J. Struct. Biol. 144, 114-122] designed for enhanced edge detection. It uses photometric weighting to identify significant discontinuities (representing edges), minimizing artefacts and noise. Compared with common edge-detectors (LoG, Marr-Hildreth) the Bilateral edge filter yielded significantly better results. Indeed data was of a similar quality to that of the Canny edge-detector, which is considered as a leading standard in edge detection [Basu, M., 2002. Gaussian-based edge-detection methods-a survey. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 32, 252-260]. Compared to the Canny edge-detector the Bilateral edge-detector has the advantages that it only requires the adjustment of a single parameter, is theoretically faster for reasonably sized images, and can be used in selective contrast enhancement of images. The simplicity and speed of the filter for single particle and tomographic analysis are discussed.
Collapse
Affiliation(s)
- Radosav S Pantelic
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Qld 4072, Australia
| | | | | | | |
Collapse
|