1
|
Zhan X, Zeng X, Uddin MR, Xu M. AITom: AI-guided cryo-electron tomography image analyses toolkit. J Struct Biol 2025; 217:108207. [PMID: 40378936 DOI: 10.1016/j.jsb.2025.108207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 04/20/2025] [Accepted: 04/28/2025] [Indexed: 05/19/2025]
Abstract
Cryo-electron tomography (cryo-ET) is an essential tool in structural biology, uniquely capable of visualizing three-dimensional macromolecular complexes within their native cellular environments, thereby providing profound molecular-level insights. Despite its significant promise, cryo-ET faces persistent challenges in the systematic localization, identification, segmentation, and structural recovery of three-dimensional subcellular components, necessitating the development of efficient and accurate large-scale image analysis methods. In response to these complexities, this paper introduces AITom, an open-source artificial intelligence platform tailored for cryo-ET researchers. AITom integrates a comprehensive suite of public and proprietary algorithms, supporting both traditional template-based and template-free approaches, alongside state-of-the-art deep learning methodologies for cryo-ET data analysis. By incorporating diverse computational strategies, AITom enables researchers to more effectively tackle the complexities inherent in cryo-ET, facilitating precise analysis and interpretation of complex biological structures. Furthermore, AITom provides extensive tutorials for each analysis module, offering valuable guidance to users in utilizing its comprehensive functionalities.
Collapse
Affiliation(s)
- Xueying Zhan
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States.
| |
Collapse
|
2
|
Zhao C, Lu D, Zhao Q, Ren C, Zhang H, Zhai J, Gou J, Zhu S, Zhang Y, Gong X. Computational methods for in situ structural studies with cryogenic electron tomography. Front Cell Infect Microbiol 2023; 13:1135013. [PMID: 37868346 PMCID: PMC10586593 DOI: 10.3389/fcimb.2023.1135013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 08/29/2023] [Indexed: 10/24/2023] Open
Abstract
Cryo-electron tomography (cryo-ET) plays a critical role in imaging microorganisms in situ in terms of further analyzing the working mechanisms of viruses and drug exploitation, among others. A data processing workflow for cryo-ET has been developed to reconstruct three-dimensional density maps and further build atomic models from a tilt series of two-dimensional projections. Low signal-to-noise ratio (SNR) and missing wedge are two major factors that make the reconstruction procedure challenging. Because only few near-atomic resolution structures have been reconstructed in cryo-ET, there is still much room to design new approaches to improve universal reconstruction resolutions. This review summarizes classical mathematical models and deep learning methods among general reconstruction steps. Moreover, we also discuss current limitations and prospects. This review can provide software and methods for each step of the entire procedure from tilt series by cryo-ET to 3D atomic structures. In addition, it can also help more experts in various fields comprehend a recent research trend in cryo-ET. Furthermore, we hope that more researchers can collaborate in developing computational methods and mathematical models for high-resolution three-dimensional structures from cryo-ET datasets.
Collapse
Affiliation(s)
- Cuicui Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Da Lu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Qian Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Chongjiao Ren
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Huangtao Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaqi Zhai
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaxin Gou
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Shilin Zhu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Yaqi Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Xinqi Gong
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
- Beijing Academy of Intelligence, Beijing, China
| |
Collapse
|
3
|
Kim HHS, Uddin MR, Xu M, Chang YW. Computational Methods Toward Unbiased Pattern Mining and Structure Determination in Cryo-Electron Tomography Data. J Mol Biol 2023; 435:168068. [PMID: 37003470 PMCID: PMC10164694 DOI: 10.1016/j.jmb.2023.168068] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/19/2023] [Accepted: 03/26/2023] [Indexed: 04/03/2023]
Abstract
Cryo-electron tomography can uniquely probe the native cellular environment for macromolecular structures. Tomograms feature complex data with densities of diverse, densely crowded macromolecular complexes, low signal-to-noise, and artifacts such as the missing wedge effect. Post-processing of this data generally involves isolating regions or particles of interest from tomograms, organizing them into related groups, and rendering final structures through subtomogram averaging. Template-matching and reference-based structure determination are popular analysis methods but are vulnerable to biases and can often require significant user input. Most importantly, these approaches cannot identify novel complexes that reside within the imaged cellular environment. To reliably extract and resolve structures of interest, efficient and unbiased approaches are therefore of great value. This review highlights notable computational software and discusses how they contribute to making automated structural pattern discovery a possibility. Perspectives emphasizing the importance of features for user-friendliness and accessibility are also presented.
Collapse
Affiliation(s)
- Hannah Hyun-Sook Kim
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. https://twitter.com/hannahinthelab
| | - Mostofa Rafid Uddin
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. https://twitter.com/duran_rafid
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
4
|
Gupta T, He X, Uddin MR, Zeng X, Zhou A, Zhang J, Freyberg Z, Xu M. Self-supervised learning for macromolecular structure classification based on cryo-electron tomograms. Front Physiol 2022; 13:957484. [PMID: 36111160 PMCID: PMC9468634 DOI: 10.3389/fphys.2022.957484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/02/2022] [Indexed: 11/21/2022] Open
Abstract
Macromolecular structure classification from cryo-electron tomography (cryo-ET) data is important for understanding macro-molecular dynamics. It has a wide range of applications and is essential in enhancing our knowledge of the sub-cellular environment. However, a major limitation has been insufficient labelled cryo-ET data. In this work, we use Contrastive Self-supervised Learning (CSSL) to improve the previous approaches for macromolecular structure classification from cryo-ET data with limited labels. We first pretrain an encoder with unlabelled data using CSSL and then fine-tune the pretrained weights on the downstream classification task. To this end, we design a cryo-ET domain-specific data-augmentation pipeline. The benefit of augmenting cryo-ET datasets is most prominent when the original dataset is limited in size. Overall, extensive experiments performed on real and simulated cryo-ET data in the semi-supervised learning setting demonstrate the effectiveness of our approach in macromolecular labeling and classification.
Collapse
Affiliation(s)
- Tarun Gupta
- Department of Computer Science and Engineering, Indian Institute of Technology, Indore, India
| | - Xuehai He
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, United States
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Andrew Zhou
- Irvington High School, Irvington, NY, United States
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
| | - Zachary Freyberg
- Departments of Psychiatry and Cell Biology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
- *Correspondence: Min Xu,
| |
Collapse
|
5
|
Singla J, White KL, Stevens RC, Alber F. Assessment of scoring functions to rank the quality of 3D subtomogram clusters from cryo-electron tomography. J Struct Biol 2021; 213:107727. [PMID: 33753204 DOI: 10.1016/j.jsb.2021.107727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 03/12/2021] [Accepted: 03/17/2021] [Indexed: 11/17/2022]
Abstract
Cryo-electron tomography provides the opportunity for unsupervised discovery of endogenous complexes in situ. This process usually requires particle picking, clustering and alignment of subtomograms to produce an average structure of the complex. When applied to heterogeneous samples, template-free clustering and alignment of subtomograms can potentially lead to the discovery of structures for unknown endogenous complexes. However, such methods require scoring functions to measure and accurately rank the quality of aligned subtomogram clusters, which can be compromised by contaminations from misclassified complexes and alignment errors. Here, we provide the first study to assess the effectiveness of more than 15 scoring functions for evaluating the quality of subtomogram clusters, which differ in the amount of structural misalignments and contaminations due to misclassified complexes. We assessed both experimental and simulated subtomograms as ground truth data sets. Our analysis showed that the robustness of scoring functions varies largely. Most scores were sensitive to the signal-to-noise ratio of subtomograms and often required Gaussian filtering as preprocessing for improved performance. Two scoring functions, Spectral SNR-based Fourier Shell Correlation and Pearson Correlation in the Fourier domain with missing wedge correction, showed a robust ranking of subtomogram clusters without any preprocessing and irrespective of SNR levels of subtomograms. Of these two scoring functions, Spectral SNR-based Fourier Shell Correlation was fastest to compute and is a better choice for handling large numbers of subtomograms. Our results provide a guidance for choosing an accurate scoring function for template-free approaches to detect complexes from heterogeneous samples.
Collapse
Affiliation(s)
- Jitin Singla
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, 520 Boyer Hall, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA; Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Kate L White
- Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Raymond C Stevens
- Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, 520 Boyer Hall, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA.
| |
Collapse
|
6
|
Lü Y, Zeng X, Zhao X, Li S, Li H, Gao X, Xu M. Fine-grained alignment of cryo-electron subtomograms based on MPI parallel optimization. BMC Bioinformatics 2019; 20:443. [PMID: 31455212 PMCID: PMC6712796 DOI: 10.1186/s12859-019-3003-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 07/19/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cryo-electron tomography (Cryo-ET) is an imaging technique used to generate three-dimensional structures of cellular macromolecule complexes in their native environment. Due to developing cryo-electron microscopy technology, the image quality of three-dimensional reconstruction of cryo-electron tomography has greatly improved. However, cryo-ET images are characterized by low resolution, partial data loss and low signal-to-noise ratio (SNR). In order to tackle these challenges and improve resolution, a large number of subtomograms containing the same structure needs to be aligned and averaged. Existing methods for refining and aligning subtomograms are still highly time-consuming, requiring many computationally intensive processing steps (i.e. the rotations and translations of subtomograms in three-dimensional space). RESULTS In this article, we propose a Stochastic Average Gradient (SAG) fine-grained alignment method for optimizing the sum of dissimilarity measure in real space. We introduce a Message Passing Interface (MPI) parallel programming model in order to explore further speedup. CONCLUSIONS We compare our stochastic average gradient fine-grained alignment algorithm with two baseline methods, high-precision alignment and fast alignment. Our SAG fine-grained alignment algorithm is much faster than the two baseline methods. Results on simulated data of GroEL from the Protein Data Bank (PDB ID:1KP8) showed that our parallel SAG-based fine-grained alignment method could achieve close-to-optimal rigid transformations with higher precision than both high-precision alignment and fast alignment at a low SNR (SNR=0.003) with tilt angle range ±60∘ or ±40∘. For the experimental subtomograms data structures of GroEL and GroEL/GroES complexes, our parallel SAG-based fine-grained alignment can achieve higher precision and fewer iterations to converge than the two baseline methods.
Collapse
Affiliation(s)
- Yongchun Lü
- University of Chinese Academy of Sciences, Beijing, China
- Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent Information Processing, CAS, Beijing, China
| | - Xiangrui Zeng
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Xiaofang Zhao
- University of Chinese Academy of Sciences, Beijing, China
- Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
| | - Shirui Li
- Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent Information Processing, CAS, Beijing, China
| | - Hua Li
- University of Chinese Academy of Sciences, Beijing, China
- Institute of Computing Technology of the Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent Information Processing, CAS, Beijing, China
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| |
Collapse
|
7
|
Zhao Y, Zeng X, Guo Q, Xu M. An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification. Bioinformatics 2019; 34:i227-i236. [PMID: 29949977 PMCID: PMC6022576 DOI: 10.1093/bioinformatics/bty267] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at sub-molecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation–maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost. Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT. Availability and implementation http://www.cs.cmu.edu/mxu1
Collapse
Affiliation(s)
- Yixiu Zhao
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiangrui Zeng
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Qiang Guo
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Min Xu
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
8
|
Abstract
Cryo-electron tomography (cryo-ET) allows three-dimensional (3D) visualization of frozen-hydrated biological samples, such as protein complexes and cell organelles, in near-native environments at nanometer scale. Protein complexes that are present in multiple copies in a set of tomograms can be extracted, mutually aligned, and averaged to yield a signal-enhanced 3D structure up to sub-nanometer or even near-atomic resolution. This technique, called subtomogram averaging (StA), is powered by improvements in EM hardware and image processing software. Importantly, StA provides unique biological insights into the structure and function of cellular machinery in close-to-native contexts. In this chapter, we describe the principles and key steps of StA. We briefly cover sample preparation and data collection with an emphasis on image processing procedures related to tomographic reconstruction, subtomogram alignment, averaging, and classification. We conclude by summarizing current limitations and future directions of this technique with a focus on high-resolution StA.
Collapse
|
9
|
Xu M, Singla J, Tocheva EI, Chang YW, Stevens RC, Jensen GJ, Alber F. De Novo Structural Pattern Mining in Cellular Electron Cryotomograms. Structure 2019; 27:679-691.e14. [PMID: 30744995 DOI: 10.1016/j.str.2019.01.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 07/27/2018] [Accepted: 01/14/2019] [Indexed: 11/16/2022]
Abstract
Electron cryotomography enables 3D visualization of cells in a near-native state at molecular resolution. The produced cellular tomograms contain detailed information about a plethora of macromolecular complexes, their structures, abundances, and specific spatial locations in the cell. However, extracting this information in a systematic way is very challenging, and current methods usually rely on individual templates of known structures. Here, we propose a framework called "Multi-Pattern Pursuit" for de novo discovery of different complexes from highly heterogeneous sets of particles extracted from entire cellular tomograms without using information of known structures. These initially detected structures can then serve as input for more targeted refinement efforts. Our tests on simulated and experimental tomograms show that our automated method is a promising tool for supporting large-scale template-free visual proteomics analysis.
Collapse
Affiliation(s)
- Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Jitin Singla
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Elitza I Tocheva
- Department of Microbiology and Immunology, Life Sciences Institute, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Raymond C Stevens
- Department of Biological Sciences and Department of Chemistry, Bridge Institute, University of Southern California, Los Angeles, CA 90089, USA
| | - Grant J Jensen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Howard Hughes Medical Institute, Pasadena, CA 91125, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
10
|
Zeng X, Leung MR, Zeev-Ben-Mordehai T, Xu M. A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation. J Struct Biol 2018; 202:150-160. [PMID: 29289599 PMCID: PMC6661905 DOI: 10.1016/j.jsb.2017.12.015] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 12/24/2017] [Accepted: 12/27/2017] [Indexed: 01/08/2023]
Abstract
Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse grouping of 3D small subvolumes extracted from tomograms. We demonstrate that the autoencoder can be used for efficient and coarse characterization of features of macromolecular complexes and surfaces, such as membranes. In addition, the autoencoder can be used to detect non-cellular features related to sample preparation and data collection, such as carbon edges from the grid and tomogram boundaries. The autoencoder is also able to detect patterns that may indicate spatial interactions between cellular components. Furthermore, we demonstrate that our autoencoder can be used for weakly supervised semantic segmentation of cellular components, requiring a very small amount of manual annotation.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh 15213, USA
| | - Miguel Ricardo Leung
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Cryo-electron Microscopy, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, Netherlands
| | - Tzviya Zeev-Ben-Mordehai
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Cryo-electron Microscopy, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, Netherlands
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh 15213, USA.
| |
Collapse
|
11
|
Xu M, Chai X, Muthakana H, Liang X, Yang G, Zeev-Ben-Mordehai T, Xing EP. Deep learning-based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms. Bioinformatics 2018; 33:i13-i22. [PMID: 28881965 PMCID: PMC5946875 DOI: 10.1093/bioinformatics/btx230] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data. Results To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data. Availability and Implementation Source code freely available at http://www.cs.cmu.edu/∼mxu1/software. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaoqi Chai
- Biomedical Engineering Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Hariank Muthakana
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaodan Liang
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ge Yang
- Biomedical Engineering Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tzviya Zeev-Ben-Mordehai
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Eric P Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|