1
|
Xu C, Zhan X, Xu M. CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders. ARXIV 2024:arXiv:2404.10178v1. [PMID: 38699171 PMCID: PMC11065045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-based approaches often require extensive labeled datasets, limiting their practicality. To overcome these obstacles, we introduce cryoMAE, a novel approach based on few-shot learning that harnesses the capabilities of Masked Autoencoders (MAE) to enable efficient selection of single particles in cryo-EM images. Contrary to conventional NN-based techniques, cryoMAE requires only a minimal set of positive particle images for training yet demonstrates high performance in particle detection. Furthermore, the implementation of a self-cross similarity loss ensures distinct features for particle and background regions, thereby enhancing the discrimination capability of cryoMAE. Experiments on large-scale cryo-EM datasets show that cryoMAE outperforms existing state-of-the-art (SOTA) methods, improving 3D reconstruction resolution by up to 22.4%.
Collapse
Affiliation(s)
- Chentianye Xu
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xueying Zhan
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
2
|
Zeng X, Ding Y, Zhang Y, Uddin MR, Dabouei A, Xu M. DUAL: deep unsupervised simultaneous simulation and denoising for cryo-electron tomography. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.02.583135. [PMID: 38496657 PMCID: PMC10942334 DOI: 10.1101/2024.03.02.583135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Recent biotechnological developments in cryo-electron tomography allow direct visualization of native sub-cellular structures with unprecedented details and provide essential information on protein functions/dysfunctions. Denoising can enhance the visualization of protein structures and distributions. Automatic annotation via data simulation can ameliorate the time-consuming manual labeling of large-scale datasets. Here, we combine the two major cryo-ET tasks together in DUAL, by a specific cyclic generative adversarial network with novel noise disentanglement. This enables end-to-end unsupervised learning that requires no labeled data for training. The denoising branch outperforms existing works and substantially improves downstream particle picking accuracy on benchmark datasets. The simulation branch provides learning-based cryo-ET simulation for the first time and generates synthetic tomograms indistinguishable from experimental ones. Through comprehensive evaluations, we showcase the effectiveness of DUAL in detecting macromolecular complexes across a wide range of molecular weights in experimental datasets. The versatility of DUAL is expected to empower cryo-ET researchers by improving visual interpretability, enhancing structural detection accuracy, expediting annotation processes, facilitating cross-domain model adaptability, and compensating for missing wedge artifacts. Our work represents a significant advancement in the unsupervised mining of protein structures in cryo-ET, offering a multifaceted tool that facilitates cryo-ET research.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Yizhe Ding
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Yueqian Zhang
- School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Mostofa Rafid Uddin
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Ali Dabouei
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Min Xu
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| |
Collapse
|
3
|
Zhao C, Lu D, Zhao Q, Ren C, Zhang H, Zhai J, Gou J, Zhu S, Zhang Y, Gong X. Computational methods for in situ structural studies with cryogenic electron tomography. Front Cell Infect Microbiol 2023; 13:1135013. [PMID: 37868346 PMCID: PMC10586593 DOI: 10.3389/fcimb.2023.1135013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 08/29/2023] [Indexed: 10/24/2023] Open
Abstract
Cryo-electron tomography (cryo-ET) plays a critical role in imaging microorganisms in situ in terms of further analyzing the working mechanisms of viruses and drug exploitation, among others. A data processing workflow for cryo-ET has been developed to reconstruct three-dimensional density maps and further build atomic models from a tilt series of two-dimensional projections. Low signal-to-noise ratio (SNR) and missing wedge are two major factors that make the reconstruction procedure challenging. Because only few near-atomic resolution structures have been reconstructed in cryo-ET, there is still much room to design new approaches to improve universal reconstruction resolutions. This review summarizes classical mathematical models and deep learning methods among general reconstruction steps. Moreover, we also discuss current limitations and prospects. This review can provide software and methods for each step of the entire procedure from tilt series by cryo-ET to 3D atomic structures. In addition, it can also help more experts in various fields comprehend a recent research trend in cryo-ET. Furthermore, we hope that more researchers can collaborate in developing computational methods and mathematical models for high-resolution three-dimensional structures from cryo-ET datasets.
Collapse
Affiliation(s)
- Cuicui Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Da Lu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Qian Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Chongjiao Ren
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Huangtao Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaqi Zhai
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaxin Gou
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Shilin Zhu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Yaqi Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Xinqi Gong
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
- Beijing Academy of Intelligence, Beijing, China
| |
Collapse
|
4
|
Zeng X, Kahng A, Xue L, Mahamid J, Chang YW, Xu M. High-throughput cryo-ET structural pattern mining by unsupervised deep iterative subtomogram clustering. Proc Natl Acad Sci U S A 2023; 120:e2213149120. [PMID: 37027429 PMCID: PMC10104553 DOI: 10.1073/pnas.2213149120] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 02/24/2023] [Indexed: 04/08/2023] Open
Abstract
Cryoelectron tomography directly visualizes heterogeneous macromolecular structures in their native and complex cellular environments. However, existing computer-assisted structure sorting approaches are low throughput or inherently limited due to their dependency on available templates and manual labels. Here, we introduce a high-throughput template-and-label-free deep learning approach, Deep Iterative Subtomogram Clustering Approach (DISCA), that automatically detects subsets of homogeneous structures by learning and modeling 3D structural features and their distributions. Evaluation on five experimental cryo-ET datasets shows that an unsupervised deep learning based method can detect diverse structures with a wide range of molecular sizes. This unsupervised detection paves the way for systematic unbiased recognition of macromolecular complexes in situ.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| | - Anson Kahng
- Computer Science Department, University of Rochester, Rochester, NY14620
| | - Liang Xue
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg69117, Germany
- Faculty of Biosciences, Collaboration for joint PhD degree between European Molecular Biology Laboratory and Heidelberg University, Heidelberg69117, Germany
| | - Julia Mahamid
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg69117, Germany
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| |
Collapse
|
5
|
Rodríguez de Francisco B, Bezault A, Xu XP, Hanein D, Volkmann N. MEPSi: A tool for simulating tomograms of membrane-embedded proteins. J Struct Biol 2022; 214:107921. [PMID: 36372192 DOI: 10.1016/j.jsb.2022.107921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 10/27/2022] [Accepted: 11/05/2022] [Indexed: 11/11/2022]
Abstract
The throughput and fidelity of cryogenic cellular electron tomography (cryo-ET) is constantly increasing through advances in cryogenic electron microscope hardware, direct electron detection devices, and powerful image processing algorithms. However, the need for careful optimization of sample preparations and for access to expensive, high-end equipment, make cryo-ET a costly and time-consuming technique. Generally, only after the last step of the cryo-ET workflow, when reconstructed tomograms are available, it becomes clear whether the chosen imaging parameters were suitable for a specific type of sample in order to answer a specific biological question. Tools for a-priory assessment of the feasibility of samples to answer biological questions and how to optimize imaging parameters to do so would be a major advantage. Here we describe MEPSi (Membrane Embedded Protein Simulator), a simulation tool aimed at rapid and convenient evaluation and optimization of cryo-ET data acquisition parameters for studies of transmembrane proteins in their native environment. We demonstrate the utility of MEPSi by showing how to detangle the influence of different data collection parameters and different orientations in respect to tilt axis and electron beam for two examples: (1) simulated plasma membranes with embedded single-pass transmembrane αIIbβ3 integrin receptors and (2) simulated virus membranes with embedded SARS-CoV-2 spike proteins.
Collapse
Affiliation(s)
- Borja Rodríguez de Francisco
- Institut Pasteur, Université Paris Cité, CNRS UMR3528, Structural Studies of Macromolecular Machines in Cellulo Unit, Paris, France; Institut Pasteur, Université Paris Cité, CNRS UMR3528, Structural Image Analysis Unit, Paris, France
| | - Armel Bezault
- Institut Pasteur, Université Paris Cité, CNRS UMR3528, Structural Studies of Macromolecular Machines in Cellulo Unit, Paris, France; Institut Pasteur, Université Paris Cité, CNRS UMR3528, Structural Image Analysis Unit, Paris, France
| | | | - Dorit Hanein
- Institut Pasteur, Université Paris Cité, CNRS UMR3528, Structural Studies of Macromolecular Machines in Cellulo Unit, Paris, France; Scintillon Institute, San Diego, CA 92121, USA
| | - Niels Volkmann
- Institut Pasteur, Université Paris Cité, CNRS UMR3528, Structural Image Analysis Unit, Paris, France.
| |
Collapse
|
6
|
Gupta T, He X, Uddin MR, Zeng X, Zhou A, Zhang J, Freyberg Z, Xu M. Self-supervised learning for macromolecular structure classification based on cryo-electron tomograms. Front Physiol 2022; 13:957484. [PMID: 36111160 PMCID: PMC9468634 DOI: 10.3389/fphys.2022.957484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/02/2022] [Indexed: 11/21/2022] Open
Abstract
Macromolecular structure classification from cryo-electron tomography (cryo-ET) data is important for understanding macro-molecular dynamics. It has a wide range of applications and is essential in enhancing our knowledge of the sub-cellular environment. However, a major limitation has been insufficient labelled cryo-ET data. In this work, we use Contrastive Self-supervised Learning (CSSL) to improve the previous approaches for macromolecular structure classification from cryo-ET data with limited labels. We first pretrain an encoder with unlabelled data using CSSL and then fine-tune the pretrained weights on the downstream classification task. To this end, we design a cryo-ET domain-specific data-augmentation pipeline. The benefit of augmenting cryo-ET datasets is most prominent when the original dataset is limited in size. Overall, extensive experiments performed on real and simulated cryo-ET data in the semi-supervised learning setting demonstrate the effectiveness of our approach in macromolecular labeling and classification.
Collapse
Affiliation(s)
- Tarun Gupta
- Department of Computer Science and Engineering, Indian Institute of Technology, Indore, India
| | - Xuehai He
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, United States
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Andrew Zhou
- Irvington High School, Irvington, NY, United States
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
| | - Zachary Freyberg
- Departments of Psychiatry and Cell Biology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
- *Correspondence: Min Xu,
| |
Collapse
|
7
|
Hao Y, Wan X, Yan R, Liu Z, Li J, Zhang S, Cui X, Zhang F. VP-Detector: A 3D multi-scale dense convolutional neural network for macromolecule localization and classification in cryo-electron tomograms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106871. [PMID: 35584579 DOI: 10.1016/j.cmpb.2022.106871] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/28/2022] [Accepted: 05/09/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Cryo-electron tomography (cryo-ET) with subtomogram averaging (STA) is indispensable when studying macromolecule structures and functions in their native environments. Due to the low signal-to-noise ratio, the missing wedge artifacts in tomographic reconstructions, and multiple macromolecules of varied shapes and sizes, macromolecule localization and classification remain challenging. To tackle this bottleneck problem for structural determination by STA, we design an accurate macromolecule localization and classification method named voxelwise particle detector (VP-Detector). METHODS VP-Detector is a two-stage particle detection method based on a 3D multiscale dense convolutional neural network (3D MSDNet). The proposed network uses 3D hybrid dilated convolution (3D HDC) to avoid the resolution loss caused by scaling operations. Meanwhile, it uses 3D dense connectivity to encourage the reuse of feature maps to reduce trainable parameters. In addition, the weighted focal loss is proposed to focus more attention on difficult samples and rare classes, which relieves the class imbalance caused by multiple particles of various sizes. The performance of VP-Detector is evaluated on both simulated and real-world tomograms, and it shows that VP-Detector outperforms state-of-the-art methods. RESULTS The experiments show that VP-Detector outperforms the state-of-the-art methods on particle localization with an F1-score of 0.951 and a precision of 0.978. In addition, VP-Detector can replace manual particle picking in experiment on the real-world tomograms. Furthermore, it performs well in classifying large-, medium-, and small-weight proteins with accuracies of 1, 0.95, and 0.82, respectively. Finally, ablation studies demonstrate the effectiveness of 3D HDC, 3D dense connectivity, weighted focal loss, and training on small training sets. CONCLUSIONS VP-Detector can achieve high accuracy in particle detection with few trainable parameters and support training on small datasets. It can also relieve the class imbalance caused by multiple particles with various shapes and sizes.
Collapse
Affiliation(s)
- Yu Hao
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Xiaohua Wan
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Rui Yan
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhiyong Liu
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Jintao Li
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Shihua Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qingdao, China.
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
| |
Collapse
|
8
|
Bandyopadhyay H, Deng Z, Ding L, Liu S, Uddin MR, Zeng X, Behpour S, Xu M. Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization. Bioinformatics 2022; 38:977-984. [PMID: 34897387 DOI: 10.1093/bioinformatics/btab794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/18/2021] [Accepted: 11/17/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms. RESULTS In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with 'warp' modules to alter the simulated data and help the classifier generalize better on experimental data. We do not use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data. AVAILABILITYAND IMPLEMENTATION https://github.com/xulabs/aitom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hmrishav Bandyopadhyay
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata 700032, India
| | - Zihao Deng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Leiting Ding
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sinuo Liu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sima Behpour
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
9
|
Wu X, Li C, Zeng X, Wei H, Deng HW, Zhang J, Xu M. CryoETGAN: Cryo-Electron Tomography Image Synthesis via Unpaired Image Translation. Front Physiol 2022; 13:760404. [PMID: 35370760 PMCID: PMC8970048 DOI: 10.3389/fphys.2022.760404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 01/17/2022] [Indexed: 12/02/2022] Open
Abstract
Cryo-electron tomography (Cryo-ET) has been regarded as a revolution in structural biology and can reveal molecular sociology. Its unprecedented quality enables it to visualize cellular organelles and macromolecular complexes at nanometer resolution with native conformations. Motivated by developments in nanotechnology and machine learning, establishing machine learning approaches such as classification, detection and averaging for Cryo-ET image analysis has inspired broad interest. Yet, deep learning-based methods for biomedical imaging typically require large labeled datasets for good results, which can be a great challenge due to the expense of obtaining and labeling training data. To deal with this problem, we propose a generative model to simulate Cryo-ET images efficiently and reliably: CryoETGAN. This cycle-consistent and Wasserstein generative adversarial network (GAN) is able to generate images with an appearance similar to the original experimental data. Quantitative and visual grading results on generated images are provided to show that the results of our proposed method achieve better performance compared to the previous state-of-the-art simulation methods. Moreover, CryoETGAN is stable to train and capable of generating plausibly diverse image samples.
Collapse
Affiliation(s)
- Xindi Wu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Chengkun Li
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Haocheng Wei
- Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Hong-Wen Deng
- Center for Biomedical Informatics & Genomics, Tulane University, New Orleans, LA, United States
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
10
|
Zeng Y, Howe G, Yi K, Zeng X, Zhang J, Chang YW, Xu M. UNSUPERVISED DOMAIN ALIGNMENT BASED OPEN SET STRUCTURAL RECOGNITION OF MACROMOLECULES CAPTURED BY CRYO-ELECTRON TOMOGRAPHY. PROCEEDINGS. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING 2021; 2021:106-110. [PMID: 35350462 PMCID: PMC8959888 DOI: 10.1109/icip42928.2021.9506205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cellular cryo-Electron Tomography (cryo-ET) provides three-dimensional views of structural and spatial information of various macromolecules in cells in a near-native state. Subtomogram classification is a key step for recognizing and differentiating these macromolecular structures. In recent years, deep learning methods have been developed for high-throughput subtomogram classification tasks; however, conventional supervised deep learning methods cannot recognize macromolecular structural classes that do not exist in the training data. This imposes a major weakness since most native macromolecular structures in cells are unknown and consequently, cannot be included in the training data. Therefore, open set learning which can recognize unknown macromolecular structures is necessary for boosting the power of automatic subtomogram classification. In this paper, we propose a method called Margin-based Loss for Unsupervised Domain Alignment (MLUDA) for open set recognition problems where only a few categories of interest are shared between cross-domain data. Through extensive experiments, we demonstrate that MLUDA performs well at cross-domain open-set classification on both public datasets and medical imaging datasets. So our method is of practical importance.
Collapse
Affiliation(s)
- Yuchen Zeng
- Computational Biology Department, Carnegie Mellon University, United States
| | - Gregory Howe
- Computational Biology Department, Carnegie Mellon University, United States
| | - Kai Yi
- King Abdullah University of Science and Technology, Saudi Arabia
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, United States
| | - Jing Zhang
- Department of Computer Science, University of California Irvine, United States
| | - Yi-Wei Chang
- Perelman School of Medicine, University of Pennsylvania, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, United States
| |
Collapse
|
11
|
Singla J, White KL, Stevens RC, Alber F. Assessment of scoring functions to rank the quality of 3D subtomogram clusters from cryo-electron tomography. J Struct Biol 2021; 213:107727. [PMID: 33753204 DOI: 10.1016/j.jsb.2021.107727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 03/12/2021] [Accepted: 03/17/2021] [Indexed: 11/17/2022]
Abstract
Cryo-electron tomography provides the opportunity for unsupervised discovery of endogenous complexes in situ. This process usually requires particle picking, clustering and alignment of subtomograms to produce an average structure of the complex. When applied to heterogeneous samples, template-free clustering and alignment of subtomograms can potentially lead to the discovery of structures for unknown endogenous complexes. However, such methods require scoring functions to measure and accurately rank the quality of aligned subtomogram clusters, which can be compromised by contaminations from misclassified complexes and alignment errors. Here, we provide the first study to assess the effectiveness of more than 15 scoring functions for evaluating the quality of subtomogram clusters, which differ in the amount of structural misalignments and contaminations due to misclassified complexes. We assessed both experimental and simulated subtomograms as ground truth data sets. Our analysis showed that the robustness of scoring functions varies largely. Most scores were sensitive to the signal-to-noise ratio of subtomograms and often required Gaussian filtering as preprocessing for improved performance. Two scoring functions, Spectral SNR-based Fourier Shell Correlation and Pearson Correlation in the Fourier domain with missing wedge correction, showed a robust ranking of subtomogram clusters without any preprocessing and irrespective of SNR levels of subtomograms. Of these two scoring functions, Spectral SNR-based Fourier Shell Correlation was fastest to compute and is a better choice for handling large numbers of subtomograms. Our results provide a guidance for choosing an accurate scoring function for template-free approaches to detect complexes from heterogeneous samples.
Collapse
Affiliation(s)
- Jitin Singla
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, 520 Boyer Hall, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA; Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Kate L White
- Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Raymond C Stevens
- Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, 520 Boyer Hall, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA.
| |
Collapse
|
12
|
Du X, Wang H, Zhu Z, Zeng X, Chang YW, Zhang J, Xing E, Xu M. Active Learning to Classify Macromolecular Structures in situ for Less Supervision in Cryo-Electron Tomography. Bioinformatics 2021; 37:2340-2346. [PMID: 33620460 DOI: 10.1093/bioinformatics/btab123] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 01/14/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cryo-Electron Tomography (cryo-ET) is a 3D bioimaging tool that visualizes the structural and spatial organization of macromolecules at a near-native state in single cells, which has broad applications in life science. However, the systematic structural recognition and recovery of macromolecules captured by cryo-ET are difficult due to high structural complexity and imaging limits. Deep learning based subtomogram classification have played critical roles for such tasks. As supervised approaches, however, their performance relies on sufficient and laborious annotation on a large training dataset. RESULTS To alleviate this major labeling burden, we proposed a Hybrid Active Learning (HAL) framework for querying subtomograms for labelling from a large unlabeled subtomogram pool. Firstly, HAL adopts uncertainty sampling to select the subtomograms that have the most uncertain predictions. This strategy enforces the model to be aware of the inductive bias during classification and subtomogram selection, which satisfies the discriminativeness principle in AL literature. Moreover, to mitigate the sampling bias caused by such strategy, a discriminator is introduced to judge if a certain subtomogram is labeled or unlabeled and subsequently the model queries the subtomogram that have higher probabilities to be unlabeled. Such query strategy encourages to match the data distribution between the labeled and unlabeled subtomogram samples, which essentially encodes the representativeness criterion into the subtomogram selection process. Additionally, HAL introduces a subset sampling strategy to improve the diversity of the query set, so that the information overlap is decreased between the queried batches and the algorithmic efficiency is improved. Our experiments on subtomogram classification tasks using both simulated and real data demonstrate that we can achieve comparable testing performance (on average only 3% accuracy drop) by using less than 30% of the labeled subtomograms, which shows a very promising result for subtomogram classification task with limited labeling resources. AVAILABILITY https://github.com/xulabs/aitom.
Collapse
Affiliation(s)
- Xuefeng Du
- Department of Computer Science, University of Wisconsin-Madison, Madison, 53706, USA
| | - Haohan Wang
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Zhenxi Zhu
- Department of Computer Science, Beijing University of Posts and Telecommunications, 100876, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, 19104, USA
| | - Jing Zhang
- Department of Computer Science, University of California - Irvine, Irvine, 92697, USA
| | - Eric Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| |
Collapse
|
13
|
Martins B, Sorrentino S, Chung WL, Tatli M, Medalia O, Eibauer M. Unveiling the polarity of actin filaments by cryo-electron tomography. Structure 2021; 29:488-498.e4. [PMID: 33476550 PMCID: PMC8111420 DOI: 10.1016/j.str.2020.12.014] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 12/04/2020] [Accepted: 12/23/2020] [Indexed: 01/01/2023]
Abstract
The actin cytoskeleton plays a fundamental role in numerous cellular processes, such as cell motility, cytokinesis, and adhesion to the extracellular matrix. Revealing the polarity of individual actin filaments in intact cells would foster an unprecedented understanding of cytoskeletal processes and their associated mechanical forces. Cryo-electron tomography provides the means for high-resolution structural imaging of cells. However, the low signal-to-noise ratio of cryo-tomograms obscures the high frequencies, and therefore the polarity of actin filaments cannot be directly measured. Here, we developed a method that enables us to determine the polarity of actin filaments in cellular cryo-tomograms. We applied it to reveal the actin polarity distribution in focal adhesions, and show a linear relation between actin polarity and distance from the apical boundary of the adhesion site. Determining the polarity of individual actin filaments inside cells Reconstruction of actin networks from cryo-tomograms The polarity of actin changes from mixed to uniform along focal adhesions
Collapse
Affiliation(s)
- Bruno Martins
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Simona Sorrentino
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Wen-Lu Chung
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Meltem Tatli
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Ohad Medalia
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| | - Matthias Eibauer
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| |
Collapse
|
14
|
Zhou B, Yu H, Zeng X, Yang X, Zhang J, Xu M. One-Shot Learning With Attention-Guided Segmentation in Cryo-Electron Tomography. Front Mol Biosci 2021; 7:613347. [PMID: 33511158 PMCID: PMC7835881 DOI: 10.3389/fmolb.2020.613347] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 12/09/2020] [Indexed: 11/13/2022] Open
Abstract
Cryo-electron Tomography (cryo-ET) generates 3D visualization of cellular organization that allows biologists to analyze cellular structures in a near-native state with nano resolution. Recently, deep learning methods have demonstrated promising performance in classification and segmentation of macromolecule structures captured by cryo-ET, but training individual deep learning models requires large amounts of manually labeled and segmented data from previously observed classes. To perform classification and segmentation in the wild (i.e., with limited training data and with unseen classes), novel deep learning model needs to be developed to classify and segment unseen macromolecules captured by cryo-ET. In this paper, we develop a one-shot learning framework, called cryo-ET one-shot network (COS-Net), for simultaneous classification of macromolecular structure and generation of the voxel-level 3D segmentation, using only one training sample per class. Our experimental results on 22 macromolecule classes demonstrated that our COS-Net could efficiently classify macromolecular structures with small amounts of samples and produce accurate 3D segmentation at the same time.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Biomedical Engineering, Yale University, New Haven, CT, United States
| | - Haisu Yu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiaoyan Yang
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Jing Zhang
- Computer Science Department, University of California, Irvine, Irvine, CA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
15
|
Liu S, Ban X, Zeng X, Zhao F, Gao Y, Wu W, Zhang H, Chen F, Hall T, Gao X, Xu M. A unified framework for packing deformable and non-deformable subcellular structures in crowded cryo-electron tomogram simulation. BMC Bioinformatics 2020; 21:399. [PMID: 32907544 PMCID: PMC7488303 DOI: 10.1186/s12859-020-03660-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 07/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cryo-electron tomography is an important and powerful technique to explore the structure, abundance, and location of ultrastructure in a near-native state. It contains detailed information of all macromolecular complexes in a sample cell. However, due to the compact and crowded status, the missing edge effect, and low signal to noise ratio (SNR), it is extremely challenging to recover such information with existing image processing methods. Cryo-electron tomogram simulation is an effective solution to test and optimize the performance of the above image processing methods. The simulated images could be regarded as the labeled data which covers a wide range of macromolecular complexes and ultrastructure. To approximate the crowded cellular environment, it is very important to pack these heterogeneous structures as tightly as possible. Besides, simulating non-deformable and deformable components under a unified framework also need to be achieved. RESULT In this paper, we proposed a unified framework for simulating crowded cryo-electron tomogram images including non-deformable macromolecular complexes and deformable ultrastructures. A macromolecule was approximated using multiple balls with fixed relative positions to reduce the vacuum volume. A ultrastructure, such as membrane and filament, was approximated using multiple balls with flexible relative positions so that this structure could deform under force field. In the experiment, 400 macromolecules of 20 representative types were packed into simulated cytoplasm by our framework, and numerical verification proved that our method has a smaller volume and higher compression ratio than the baseline single-ball model. We also packed filaments, membranes and macromolecules together, to obtain a simulated cryo-electron tomogram image with deformable structures. The simulated results are closer to the real Cryo-ET, making the analysis more difficult. The DOG particle picking method and the image segmentation method are tested on our simulation data, and the experimental results show that these methods still have much room for improvement. CONCLUSION The proposed multi-ball model can achieve more crowded packaging results and contains richer elements with different properties to obtain more realistic cryo-electron tomogram simulation. This enables users to simulate cryo-electron tomogram images with non-deformable macromolecular complexes and deformable ultrastructures under a unified framework. To illustrate the advantages of our framework in improving the compression ratio, we calculated the volume of simulated macromolecular under our multi-ball method and traditional single-ball method. We also performed the packing experiment of filaments and membranes to demonstrate the simulation ability of deformable structures. Our method can be used to do a benchmark by generating large labeled cryo-ET dataset and evaluating existing image processing methods. Since the content of the simulated cryo-ET is more complex and crowded compared with previous ones, it will pose a greater challenge to existing image processing methods.
Collapse
Affiliation(s)
- Sinuo Liu
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA United States
| | - Xiaojuan Ban
- Beijing Advanced Innovation Center for Materials Genome Engineering, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA United States
| | - Fengnian Zhao
- WuYuzhang Honors College, Sichuan University, Sichuan, China
| | - Yuan Gao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA United States
| | | | - Hongpan Zhang
- School of Information Science and Technology, Beijing Forestry University, Beijing, China
- College of Life Science, Sichuan University, Sichuan, China
| | - Feiyang Chen
- Thuwal, Saudi Arabia, King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, Saudi Arabia
| | - Thomas Hall
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA United States
| | - Xin Gao
- School of Mechanical, Electrical and Information Engineering, Shandong University, Shandong, China
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA United States
| |
Collapse
|
16
|
Zeng X, Xu M. Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2020; 2020:4072-4082. [PMID: 33716478 PMCID: PMC7955792 DOI: 10.1109/cvpr42600.2020.00413] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We propose a Geometric unsupervised matching Network (Gum-Net) for finding the geometric correspondence between two images with application to 3D subtomogram alignment and averaging. Subtomogram alignment is the most important task in cryo-electron tomography (cryo-ET), a revolutionary 3D imaging technique for visualizing the molecular organization of unperturbed cellular landscapes in single cells. However, subtomogram alignment and averaging are very challenging due to severe imaging limits such as noise and missing wedge effects. We introduce an end-to-end trainable architecture with three novel modules specifically designed for preserving feature spatial information and propagating feature matching information. The training is performed in a fully unsupervised fashion to optimize a matching metric. No ground truth transformation information nor category-level or instance-level matching supervision information is needed. After systematic assessments on six real and nine simulated datasets, we demonstrate that Gum-Net reduced the alignment error by 40 to 50% and improved the averaging resolution by 10%. Gum-Net also achieved 70 to 110 times speedup in practice with GPU acceleration compared to state-of-the-art subtomogram alignment methods. Our work is the first 3D unsupervised geometric matching method for images of strong transformation variation and high noise level. The training code, trained model, and datasets are available in our open-source software AITom.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| |
Collapse
|
17
|
Chen F, Jiang Y, Zeng X, Zhang J, Gao X, Xu M. PUB-SalNet: A Pre-trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation. ALGORITHMS 2020; 13:126. [PMID: 34567437 PMCID: PMC8460134 DOI: 10.3390/a13050126] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Salient segmentation is a critical step in biomedical image analysis, aiming to cut out regions that are most interesting to humans. Recently, supervised methods have achieved promising results in biomedical areas, but they depend on annotated training data sets, which requires labor and proficiency in related background knowledge. In contrast, unsupervised learning makes data-driven decisions by obtaining insights directly from the data themselves. In this paper, we propose a completely unsupervised self-aware network based on pre-training and attentional backpropagation for biomedical salient segmentation, named as PUB-SalNet. Firstly, we aggregate a new biomedical data set from several simulated Cellular Electron Cryo-Tomography (CECT) data sets featuring rich salient objects, different SNR settings and various resolutions, which is called SalSeg-CECT. Based on the SalSeg-CECT data set, we then pre-train a model specially designed for biomedical tasks as a backbone module to initialize network parameters. Next, we present a U-SalNet network to learn to selectively attend to salient objects. It includes two types of attention modules to facilitate learning saliency through global contrast and local similarity. Lastly, we jointly refine the salient regions together with feature representations from U-SalNet, with the parameters updated by self-aware attentional backpropagation. We apply PUB-SalNet for analysis of 2D simulated and real images and achieve state-of-the-art performance on simulated biomedical data sets. Furthermore, our proposed PUB-SalNet can be easily extended to 3D images. The experimental results on the 2d and 3d data sets also demonstrate the generalization ability and robustness of our method.
Collapse
Affiliation(s)
- Feiyang Chen
- Compututational Biology Department, Carnegie Mellon University
| | - Ying Jiang
- Compututational Biology Department, Carnegie Mellon University
| | - Xiangrui Zeng
- Compututational Biology Department, Carnegie Mellon University
| | - Jing Zhang
- Department of Computer Science, University of California Irvine
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology
| | - Min Xu
- Compututational Biology Department, Carnegie Mellon University
| |
Collapse
|
18
|
Template-free detection and classification of membrane-bound complexes in cryo-electron tomograms. Nat Methods 2020; 17:209-216. [PMID: 31907446 DOI: 10.1038/s41592-019-0675-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 11/11/2019] [Indexed: 01/12/2023]
Abstract
With faithful sample preservation and direct imaging of fully hydrated biological material, cryo-electron tomography provides an accurate representation of molecular architecture of cells. However, detection and precise localization of macromolecular complexes within cellular environments is aggravated by the presence of many molecular species and molecular crowding. We developed a template-free image processing procedure for accurate tracing of complex networks of densities in cryo-electron tomograms, a comprehensive and automated detection of heterogeneous membrane-bound complexes and an unsupervised classification (PySeg). Applications to intact cells and isolated endoplasmic reticulum (ER) allowed us to detect and classify small protein complexes. This classification provided sufficiently homogeneous particle sets and initial references to allow subsequent de novo subtomogram averaging. Spatial distribution analysis showed that ER complexes have different localization patterns forming nanodomains. Therefore, this procedure allows a comprehensive detection and structural analysis of complexes in situ.
Collapse
|
19
|
Zhao Y, Zeng X, Guo Q, Xu M. An integration of fast alignment and maximum-likelihood methods for electron subtomogram averaging and classification. Bioinformatics 2019; 34:i227-i236. [PMID: 29949977 PMCID: PMC6022576 DOI: 10.1093/bioinformatics/bty267] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Cellular Electron CryoTomography (CECT) is an emerging 3D imaging technique that visualizes subcellular organization of single cells at sub-molecular resolution and in near-native state. CECT captures large numbers of macromolecular complexes of highly diverse structures and abundances. However, the structural complexity and imaging limits complicate the systematic de novo structural recovery and recognition of these macromolecular complexes. Efficient and accurate reference-free subtomogram averaging and classification represent the most critical tasks for such analysis. Existing subtomogram alignment based methods are prone to the missing wedge effects and low signal-to-noise ratio (SNR). Moreover, existing maximum-likelihood based methods rely on integration operations, which are in principle computationally infeasible for accurate calculation. Results Built on existing works, we propose an integrated method, Fast Alignment Maximum Likelihood method (FAML), which uses fast subtomogram alignment to sample sub-optimal rigid transformations. The transformations are then used to approximate integrals for maximum-likelihood update of subtomogram averages through expectation–maximization algorithm. Our tests on simulated and experimental subtomograms showed that, compared to our previously developed fast alignment method (FA), FAML is significantly more robust to noise and missing wedge effects with moderate increases of computation cost. Besides, FAML performs well with significantly fewer input subtomograms when the FA method fails. Therefore, FAML can serve as a key component for improved construction of initial structural models from macromolecules captured by CECT. Availability and implementation http://www.cs.cmu.edu/mxu1
Collapse
Affiliation(s)
- Yixiu Zhao
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiangrui Zeng
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Qiang Guo
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Min Xu
- Computational Biology and Computer Science Departments, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
20
|
Lin R, Zeng X, Kitani K, Xu M. Adversarial domain adaptation for cross data source macromolecule in situ structural classification in cellular electron cryo-tomograms. Bioinformatics 2019; 35:i260-i268. [PMID: 31510673 PMCID: PMC6612867 DOI: 10.1093/bioinformatics/btz364] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Since 2017, an increasing amount of attention has been paid to the supervised deep learning-based macromolecule in situ structural classification (i.e. subtomogram classification) in cellular electron cryo-tomography (CECT) due to the substantially higher scalability of deep learning. However, the success of such supervised approach relies heavily on the availability of large amounts of labeled training data. For CECT, creating valid training data from the same data source as prediction data is usually laborious and computationally intensive. It would be beneficial to have training data from a separate data source where the annotation is readily available or can be performed in a high-throughput fashion. However, the cross data source prediction is often biased due to the different image intensity distributions (a.k.a. domain shift). RESULTS We adapt a deep learning-based adversarial domain adaptation (3D-ADA) method to timely address the domain shift problem in CECT data analysis. 3D-ADA first uses a source domain feature extractor to extract discriminative features from the training data as the input to a classifier. Then it adversarially trains a target domain feature extractor to reduce the distribution differences of the extracted features between training and prediction data. As a result, the same classifier can be directly applied to the prediction data. We tested 3D-ADA on both experimental and realistically simulated subtomogram datasets under different imaging conditions. 3D-ADA stably improved the cross data source prediction, as well as outperformed two popular domain adaptation methods. Furthermore, we demonstrate that 3D-ADA can improve cross data source recovery of novel macromolecular structures. AVAILABILITY AND IMPLEMENTATION https://github.com/xulabs/projects. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruogu Lin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kris Kitani
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
21
|
Xu M, Singla J, Tocheva EI, Chang YW, Stevens RC, Jensen GJ, Alber F. De Novo Structural Pattern Mining in Cellular Electron Cryotomograms. Structure 2019; 27:679-691.e14. [PMID: 30744995 DOI: 10.1016/j.str.2019.01.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 07/27/2018] [Accepted: 01/14/2019] [Indexed: 11/16/2022]
Abstract
Electron cryotomography enables 3D visualization of cells in a near-native state at molecular resolution. The produced cellular tomograms contain detailed information about a plethora of macromolecular complexes, their structures, abundances, and specific spatial locations in the cell. However, extracting this information in a systematic way is very challenging, and current methods usually rely on individual templates of known structures. Here, we propose a framework called "Multi-Pattern Pursuit" for de novo discovery of different complexes from highly heterogeneous sets of particles extracted from entire cellular tomograms without using information of known structures. These initially detected structures can then serve as input for more targeted refinement efforts. Our tests on simulated and experimental tomograms show that our automated method is a promising tool for supporting large-scale template-free visual proteomics analysis.
Collapse
Affiliation(s)
- Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Jitin Singla
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Elitza I Tocheva
- Department of Microbiology and Immunology, Life Sciences Institute, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Raymond C Stevens
- Department of Biological Sciences and Department of Chemistry, Bridge Institute, University of Southern California, Los Angeles, CA 90089, USA
| | - Grant J Jensen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Howard Hughes Medical Institute, Pasadena, CA 91125, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
22
|
Zhou B, Guo Q, Zeng X, Xu M. Feature Decomposition Based Saliency Detection in Electron Cryo-Tomograms. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2019; 2018:2467-2473. [PMID: 31205800 DOI: 10.1109/bibm.2018.8621363] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Electron Cryo-Tomography (ECT) allows 3D visualization of subcellular structures at the submolecular resolution in close to the native state. However, due to the high degree of structural complexity and imaging limits, the automatic segmentation of cellular components from ECT images is very difficult. To complement and speed up existing segmentation methods, it is desirable to develop a generic cell component segmentation method that is 1) not specific to particular types of cellular components, 2) able to segment unknown cellular components, 3) fully unsupervised and does not rely on the availability of training data. As an important step towards this goal, in this paper, we propose a saliency detection method that computes the likelihood that a subregion in a tomogram stands out from the background. Our method consists of four steps: supervoxel over-segmentation, feature extraction, feature matrix decomposition, and computation of saliency. The method produces a distribution map that represents the regions' saliency in tomograms. Our experiments show that our method can successfully label most salient regions detected by a human observer, and able to filter out regions not containing cellular components. Therefore, our method can remove the majority of the background region, and significantly speed up the subsequent processing of segmentation and recognition of cellular components captured by ECT.
Collapse
Affiliation(s)
- Bo Zhou
- Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
| | - Qiang Guo
- Max Planck Institute for Biochemistry, Martinsried, Germany
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA
| |
Collapse
|
23
|
Che C, Lin R, Zeng X, Elmaaroufi K, Galeotti J, Xu M. Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. MACHINE VISION AND APPLICATIONS 2018; 29:1227-1236. [PMID: 31511756 PMCID: PMC6738941 DOI: 10.1007/s00138-018-0949-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 01/16/2018] [Accepted: 05/18/2018] [Indexed: 05/30/2023]
Abstract
Cellular processes are governed by macromolecular complexes inside the cell. Study of the native structures of macromolecular complexes has been extremely difficult due to lack of data. With recent breakthroughs in Cellular Electron Cryo-Tomography (CECT) 3D imaging technology, it is now possible for researchers to gain accesses to fully study and understand the macro-molecular structures single cells. However, systematic recovery of macromolecular structures from CECT is very difficult due to high degree of structural complexity and practical imaging limitations. Specifically, we proposed a deep learning-based image classification approach for large-scale systematic macromolecular structure separation from CECT data. However, our previous work was only a very initial step toward exploration of the full potential of deep learning-based macromolecule separation. In this paper, we focus on improving classification performance by proposing three newly designed individual CNN models: an extended version of (Deep Small Receptive Field) DSRF3D, donated as DSRF3D-v2, a 3D residual block-based neural network, named as RB3D, and a convolutional 3D (C3D)-based model, CB3D. We compare them with our previously developed model (DSRF3D) on 12 datasets with different SNRs and tilt angle ranges. The experiments show that our new models achieved significantly higher classification accuracies. The accuracies are not only higher than 0.9 on normal datasets, but also demonstrate potentials to operate on datasets with high levels of noises and missing wedge effects presented.
Collapse
Affiliation(s)
- Chengqian Che
- The Robotics Institute, Carnegie Mellon University,Pittsburgh, USA
| | - Ruogu Lin
- Department of Automation, Tsinghua University, Beijing, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA
| | - Karim Elmaaroufi
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA
| | - John Galeotti
- The Robotics Institute, Carnegie Mellon University,Pittsburgh, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, USA
| |
Collapse
|
24
|
Liu C, Zeng X, Lin R, Liang X, Freyberg Z, Xing E, Xu M. DEEP LEARNING BASED SUPERVISED SEMANTIC SEGMENTATION OF ELECTRON CRYO-SUBTOMOGRAMS. PROCEEDINGS. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING 2018; 2018:1578-1582. [PMID: 37799820 PMCID: PMC10552869 DOI: 10.1109/icip.2018.8451386] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution. It enables analyzing the native structures of macromolecular complexes and their spatial organization inside single cells. However, due to the high degree of structural complexity and practical imaging limitations, systematic macromolecular structural recovery inside CECT images remains challenging. Particularly, the recovery of a macromolecule is likely to be biased by its neighbor structures due to the high molecular crowding. To reduce the bias, here we introduce a novel 3D convolutional neural network inspired by Fully Convolutional Network and Encoder-Decoder Architecture for the supervised segmentation of macromolecules of interest in subtomograms. The tests of our models on realistically simulated CECT data demonstrate that our new approach has significantly improved segmentation performance compared to our baseline approach. Also, we demonstrate that the proposed model has generalization ability to segment new structures that do not exist in training data.
Collapse
Affiliation(s)
- Chang Liu
- Electrical and Computer Engineering Department, Carnegie Mellon University, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, USA
| | - Ruogu Lin
- Department of Automation, Tsinghua University, China
| | - Xiaodan Liang
- Machine Learning Department, Carnegie Mellon University, USA
| | - Zachary Freyberg
- Departments of Psychiatry and Cell Biology, University of Pittsburgh, USA
| | - Eric Xing
- Machine Learning Department, Carnegie Mellon University, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, USA
| |
Collapse
|
25
|
Liu C, Zeng X, Wang KW, Guo Q, Xu M. Multi-task Learning for Macromolecule Classification, Segmentation and Coarse Structural Recovery in Cryo-Tomography. BMVC : PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE. BRITISH MACHINE VISION CONFERENCE 2018; 2018:1007. [PMID: 36951799 PMCID: PMC10028434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Cellular Electron Cryo-Tomography (CECT) is a powerful 3D imaging tool for studying the native structure and organization of macromolecules inside single cells. For systematic recognition and recovery of macromolecular structures captured by CECT, methods for several important tasks such as subtomogram classification and semantic segmentation have been developed. However, the recognition and recovery of macromolecular structures are still very difficult due to high molecular structural diversity, crowding molecular environment, and the imaging limitations of CECT. In this paper, we propose a novel multi-task 3D convolutional neural network model for simultaneous classification, segmentation, and coarse structural recovery of macromolecules of interest in subtomograms. In our model, the learned image features of one task are shared and thereby mutually reinforce the learning of other tasks. Evaluated on realistically simulated and experimental CECT data, our multi-task learning model outperformed all single-task learning methods for classification and segmentation. In addition, we demonstrate that our model can generalize to discover, segment and recover novel structures that do not exist in the training data.
Collapse
Affiliation(s)
- Chang Liu
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| | - Xiangrui Zeng
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| | - Kai Wen Wang
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| | - Qiang Guo
- Max Planck Institute for Biochemistry Martinsried, Germany
| | - Min Xu
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| |
Collapse
|
26
|
Zeng X, Leung MR, Zeev-Ben-Mordehai T, Xu M. A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation. J Struct Biol 2018; 202:150-160. [PMID: 29289599 PMCID: PMC6661905 DOI: 10.1016/j.jsb.2017.12.015] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 12/24/2017] [Accepted: 12/27/2017] [Indexed: 01/08/2023]
Abstract
Cellular electron cryo-tomography enables the 3D visualization of cellular organization in the near-native state and at submolecular resolution. However, the contents of cellular tomograms are often complex, making it difficult to automatically isolate different in situ cellular components. In this paper, we propose a convolutional autoencoder-based unsupervised approach to provide a coarse grouping of 3D small subvolumes extracted from tomograms. We demonstrate that the autoencoder can be used for efficient and coarse characterization of features of macromolecular complexes and surfaces, such as membranes. In addition, the autoencoder can be used to detect non-cellular features related to sample preparation and data collection, such as carbon edges from the grid and tomogram boundaries. The autoencoder is also able to detect patterns that may indicate spatial interactions between cellular components. Furthermore, we demonstrate that our autoencoder can be used for weakly supervised semantic segmentation of cellular components, requiring a very small amount of manual annotation.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh 15213, USA
| | - Miguel Ricardo Leung
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Cryo-electron Microscopy, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, Netherlands
| | - Tzviya Zeev-Ben-Mordehai
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK; Cryo-electron Microscopy, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, Netherlands
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh 15213, USA.
| |
Collapse
|
27
|
Xu M, Chai X, Muthakana H, Liang X, Yang G, Zeev-Ben-Mordehai T, Xing EP. Deep learning-based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms. Bioinformatics 2018; 33:i13-i22. [PMID: 28881965 PMCID: PMC5946875 DOI: 10.1093/bioinformatics/btx230] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data. Results To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data. Availability and Implementation Source code freely available at http://www.cs.cmu.edu/∼mxu1/software. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaoqi Chai
- Biomedical Engineering Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Hariank Muthakana
- Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaodan Liang
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ge Yang
- Biomedical Engineering Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tzviya Zeev-Ben-Mordehai
- Division of Structural Biology, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Eric P Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|