1
|
Zhou P, Liu Z, Dai J, Yang M, Sui H, Huang Z, Li Y, Song L. KNN algorithm for accurate identification of IFP lesions in the knee joint: a multimodal MRI study. Sci Rep 2025; 15:18163. [PMID: 40414927 DOI: 10.1038/s41598-025-02786-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 05/15/2025] [Indexed: 05/27/2025] Open
Abstract
Knee-related disorders represent a major global health concern and are a leading cause of pain and mobility impairment, particularly in older adults. In clinical medicine, the precise identification and classification of knee joint diseases are essential for early diagnosis and effective treatment. This study presents a novel approach for identifying infrapatellar fat pad (IFP) lesions using the K-Nearest Neighbor (KNN) algorithm in combination with multimodal Magnetic Resonance Imaging (MRI) techniques, specifically mDxion-Quant (mDQ) and T2 mapping (T2m). These imaging methods provide quantitative parameters such as fat fraction (FF), T2*, and T2 values. A set of derived features was constructed through feature engineering to better capture variations within the IFP. These features were used to train the KNN model for classifying knee joint conditions. The proposed method achieved classification accuracies of 94.736% and 92.857% on the training and testing datasets, respectively, outperforming the CNN-Class8 benchmark. This technique holds substantial clinical potential for the early detection of knee joint pathologies, monitoring disease progression, and evaluating post-surgical outcomes.
Collapse
Affiliation(s)
- Peng Zhou
- Department of Imaging, The Affiliated Hospital of Guizhou Medical University, No.28, Beijing Road, Yunyan District, Guiyang, 550000, Guizhou Province, China
- Department of Imaging, Dejiang County People's Hospital, Tongren, 565299, China
| | - Zhenyan Liu
- Department of Imaging, Dejiang County People's Hospital, Tongren, 565299, China
| | - Jiang Dai
- Department of Imaging, Dejiang County People's Hospital, Tongren, 565299, China
| | - Ming Yang
- Department of Imaging, College of Electronic Engineering, Guizhou University, Guiyang, 550025, China
| | - He Sui
- Department of Imaging, The Affiliated Hospital of Guizhou Medical University, No.28, Beijing Road, Yunyan District, Guiyang, 550000, Guizhou Province, China
| | - Zhaoshu Huang
- Department of Imaging, The Affiliated Hospital of Guizhou Medical University, No.28, Beijing Road, Yunyan District, Guiyang, 550000, Guizhou Province, China
| | - Yu Li
- Department of Imaging, The Affiliated Hospital of Guizhou Medical University, No.28, Beijing Road, Yunyan District, Guiyang, 550000, Guizhou Province, China
| | - Lingling Song
- Department of Imaging, The Affiliated Hospital of Guizhou Medical University, No.28, Beijing Road, Yunyan District, Guiyang, 550000, Guizhou Province, China.
| |
Collapse
|
2
|
Zhan X, Zeng X, Uddin MR, Xu M. AITom: AI-guided cryo-electron tomography image analyses toolkit. J Struct Biol 2025; 217:108207. [PMID: 40378936 DOI: 10.1016/j.jsb.2025.108207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 04/20/2025] [Accepted: 04/28/2025] [Indexed: 05/19/2025]
Abstract
Cryo-electron tomography (cryo-ET) is an essential tool in structural biology, uniquely capable of visualizing three-dimensional macromolecular complexes within their native cellular environments, thereby providing profound molecular-level insights. Despite its significant promise, cryo-ET faces persistent challenges in the systematic localization, identification, segmentation, and structural recovery of three-dimensional subcellular components, necessitating the development of efficient and accurate large-scale image analysis methods. In response to these complexities, this paper introduces AITom, an open-source artificial intelligence platform tailored for cryo-ET researchers. AITom integrates a comprehensive suite of public and proprietary algorithms, supporting both traditional template-based and template-free approaches, alongside state-of-the-art deep learning methodologies for cryo-ET data analysis. By incorporating diverse computational strategies, AITom enables researchers to more effectively tackle the complexities inherent in cryo-ET, facilitating precise analysis and interpretation of complex biological structures. Furthermore, AITom provides extensive tutorials for each analysis module, offering valuable guidance to users in utilizing its comprehensive functionalities.
Collapse
Affiliation(s)
- Xueying Zhan
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States.
| |
Collapse
|
3
|
Meng W, Yu X, Zhang T, Han R. A noise-robust classification method for cryo-ET subtomograms with out-of-distribution detection. BIOINFORMATICS (OXFORD, ENGLAND) 2025; 41:btaf274. [PMID: 40358513 DOI: 10.1093/bioinformatics/btaf274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Revised: 03/27/2025] [Accepted: 05/12/2025] [Indexed: 05/15/2025]
Abstract
MOTIVATION Cryogenic electron tomography (cryo-ET) enables high-resolution 3D reconstruction of biological samples, with accurate subtomogram classification critical for structural analysis. However, current subtomogram classification methods often struggle with out-of-distribution (OOD) data issue, causing misclassification and mismatched structures. RESULTS To solve this problem, we propose a unified subtomogram classification framework that incorporates OOD detection to distinguish unknown (OOD) from known (in-distribution, ID) classes and predict labels for ID data, thereby enhancing existing subtomogram classification methods. Within this framework, we develop a noise-robust classification method that integrates a 3D discrete wavelet transform-based encoder to reduce high-frequency noise and extract robust features. Additionally, we incorporate a Mahalanobis distance-based OOD detector with a reliable metric for 3D subtomograms and introduce an adaptive classifier that adjusts to accommodate datasets of varying scales. The experimental and visualization results demonstrate that our noise-robust method improves subtomogram classification accuracy and effectively models features while enhancing OOD detection. AVAILABILITY AND IMPLEMENTATION Our code is available at https://github.com/yxs1137/Subtomo-Classification-with-OOD.git. The real data used in this study can be accessed through CryoET Data Portal.
Collapse
Affiliation(s)
- Wenjia Meng
- School of Software, Shandong University, Jinan 250101, China
| | - Xueshi Yu
- School of Software, Shandong University, Jinan 250101, China
| | - Tingting Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Renmin Han
- College of Medical Information and Engineering, Ningxia Medical University, Yinchuan 750004, China
- Research Center for Mathematics and Interdisciplinary Sciences (Ministry of Education Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| |
Collapse
|
4
|
Jin W, Zhou Y, Bartesaghi A. Accurate size-based protein localization from cryo-ET tomograms. J Struct Biol X 2024; 10:100104. [PMID: 39044770 PMCID: PMC11263962 DOI: 10.1016/j.yjsbx.2024.100104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024] Open
Abstract
Cryo-electron tomography (cryo-ET) combined with sub-tomogram averaging (STA) allows the determination of protein structures imaged within the native context of the cell at near-atomic resolution. Particle picking is an essential step in the cryo-ET/STA image analysis pipeline that consists in locating the position of proteins within crowded cellular tomograms so that they can be aligned and averaged in 3D to improve resolution. While extensive work in 2D particle picking has been done in the context of single-particle cryo-EM, comparatively fewer strategies have been proposed to pick particles from 3D tomograms, in part due to the challenges associated with working with noisy 3D volumes affected by the missing wedge. While strategies based on 3D template-matching and deep learning are commonly used, these methods are computationally expensive and require either an external template or manual labelling which can bias the results and limit their applicability. Here, we propose a size-based method to pick particles from tomograms that is fast, accurate, and does not require external templates or user provided labels. We compare the performance of our approach against a commonly used algorithm based on deep learning, crYOLO, and show that our method: i) has higher detection accuracy, ii) does not require user input for labeling or time-consuming training, and iii) runs efficiently on non-specialized CPU hardware. We demonstrate the effectiveness of our approach by automatically detecting particles from tomograms representing different types of samples and using these particles to determine the high-resolution structures of ribosomes imaged in vitro and in situ.
Collapse
Affiliation(s)
- Weisheng Jin
- Department of Computer Science, Duke University, Durham, USA
| | - Ye Zhou
- Department of Computer Science, Duke University, Durham, USA
| | - Alberto Bartesaghi
- Department of Computer Science, Duke University, Durham, USA
- Department of Biochemistry, Duke University School of Medicine, Durham, USA
- Department of Electrical and Computer Engineering, Pratt School of Engineering, Duke University, Durham, USA
| |
Collapse
|
5
|
Jain S, Li X, Xu M. Knowledge Transfer from Macro-world to Micro-world: Enhancing 3D Cryo-ET Classification through Fine-Tuning Video-based Deep Models. Bioinformatics 2024; 40:btae368. [PMID: 38889274 PMCID: PMC11269433 DOI: 10.1093/bioinformatics/btae368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/30/2024] [Accepted: 06/11/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Deep learning models have achieved remarkable success in a wide range of natural-world tasks, such as vision, language, and speech recognition. These accomplishments are largely attributed to the availability of open-source large-scale datasets. More importantly, pre-trained foundational modellearnings exhibit a surprising degree of transferability to downstream tasks, enabling efficient learning even with limited training examples. However, the application of such natural-domain models to the domain of tiny Cryo-Electron Tomography (Cryo-ET) images has been a relatively unexplored frontier. This research is motivated by the intuition that 3D Cryo-ET voxel data can be conceptually viewed as a sequence of progressively evolving video frames. RESULTS Leveraging the above insight, we propose a novel approach that involves the utilization of 3D models pre-trained on large-scale video datasets to enhance Cryo-ET subtomogram classification. Our experiments, conducted on both simulated and real Cryo-ET datasets, reveal compelling results. The use of video initialization not only demonstrates improvements in classification accuracy but also substantially reduces training costs. Further analyses provide additional evidence of the value of video initialization in enhancing subtomogram feature extraction. Additionally, we observe that video initialization yields similar positive effects when applied to medical 3D classification tasks, underscoring the potential of cross-domain knowledge transfer from video-based models to advance the state-of-the-art in a wide range of biological and medical data types. AVAILABILITY AND IMPLEMENTATION https://github.com/xulabs/aitom.
Collapse
Affiliation(s)
- Sabhay Jain
- Electrical Engineering Department, Indian Institute of Technology Kanpur, Kanpur, Uttar Pradesh, 208016, India
| | - Xingjian Li
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States
| | - Min Xu
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, 15213, United States
| |
Collapse
|
6
|
Zeng X, Ding Y, Zhang Y, Uddin MR, Dabouei A, Xu M. DUAL: deep unsupervised simultaneous simulation and denoising for cryo-electron tomography. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.02.583135. [PMID: 38496657 PMCID: PMC10942334 DOI: 10.1101/2024.03.02.583135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Recent biotechnological developments in cryo-electron tomography allow direct visualization of native sub-cellular structures with unprecedented details and provide essential information on protein functions/dysfunctions. Denoising can enhance the visualization of protein structures and distributions. Automatic annotation via data simulation can ameliorate the time-consuming manual labeling of large-scale datasets. Here, we combine the two major cryo-ET tasks together in DUAL, by a specific cyclic generative adversarial network with novel noise disentanglement. This enables end-to-end unsupervised learning that requires no labeled data for training. The denoising branch outperforms existing works and substantially improves downstream particle picking accuracy on benchmark datasets. The simulation branch provides learning-based cryo-ET simulation for the first time and generates synthetic tomograms indistinguishable from experimental ones. Through comprehensive evaluations, we showcase the effectiveness of DUAL in detecting macromolecular complexes across a wide range of molecular weights in experimental datasets. The versatility of DUAL is expected to empower cryo-ET researchers by improving visual interpretability, enhancing structural detection accuracy, expediting annotation processes, facilitating cross-domain model adaptability, and compensating for missing wedge artifacts. Our work represents a significant advancement in the unsupervised mining of protein structures in cryo-ET, offering a multifaceted tool that facilitates cryo-ET research.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Yizhe Ding
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Yueqian Zhang
- School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Mostofa Rafid Uddin
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Ali Dabouei
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Min Xu
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| |
Collapse
|
7
|
Braet F, Poger D. Let's have a chat about chatbot(s) in (biological) microscopy. J Microsc 2023; 292:59-63. [PMID: 37742291 DOI: 10.1111/jmi.13230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/30/2023] [Accepted: 09/20/2023] [Indexed: 09/26/2023]
Affiliation(s)
- Filip Braet
- School of Medical Sciences (Molecular and Cellular Biomedicine), The University of Sydney, New South Wales, Australia
- Australian Centre for Microscopy and Microanalysis, The University of Sydney, Sydney, New South Wales, Australia
| | - David Poger
- Microscopy Australia, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
8
|
Zhao C, Lu D, Zhao Q, Ren C, Zhang H, Zhai J, Gou J, Zhu S, Zhang Y, Gong X. Computational methods for in situ structural studies with cryogenic electron tomography. Front Cell Infect Microbiol 2023; 13:1135013. [PMID: 37868346 PMCID: PMC10586593 DOI: 10.3389/fcimb.2023.1135013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 08/29/2023] [Indexed: 10/24/2023] Open
Abstract
Cryo-electron tomography (cryo-ET) plays a critical role in imaging microorganisms in situ in terms of further analyzing the working mechanisms of viruses and drug exploitation, among others. A data processing workflow for cryo-ET has been developed to reconstruct three-dimensional density maps and further build atomic models from a tilt series of two-dimensional projections. Low signal-to-noise ratio (SNR) and missing wedge are two major factors that make the reconstruction procedure challenging. Because only few near-atomic resolution structures have been reconstructed in cryo-ET, there is still much room to design new approaches to improve universal reconstruction resolutions. This review summarizes classical mathematical models and deep learning methods among general reconstruction steps. Moreover, we also discuss current limitations and prospects. This review can provide software and methods for each step of the entire procedure from tilt series by cryo-ET to 3D atomic structures. In addition, it can also help more experts in various fields comprehend a recent research trend in cryo-ET. Furthermore, we hope that more researchers can collaborate in developing computational methods and mathematical models for high-resolution three-dimensional structures from cryo-ET datasets.
Collapse
Affiliation(s)
- Cuicui Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Da Lu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Qian Zhao
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Chongjiao Ren
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Huangtao Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaqi Zhai
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Jiaxin Gou
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Shilin Zhu
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Yaqi Zhang
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
| | - Xinqi Gong
- Mathematical Intelligence Application LAB, Institute for Mathematical Sciences, Renmin University of China, Beijing, China
- Beijing Academy of Intelligence, Beijing, China
| |
Collapse
|
9
|
Poger D, Yen L, Braet F. Big data in contemporary electron microscopy: challenges and opportunities in data transfer, compute and management. Histochem Cell Biol 2023; 160:169-192. [PMID: 37052655 PMCID: PMC10492738 DOI: 10.1007/s00418-023-02191-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2023] [Indexed: 04/14/2023]
Abstract
The second decade of the twenty-first century witnessed a new challenge in the handling of microscopy data. Big data, data deluge, large data, data compliance, data analytics, data integrity, data interoperability, data retention and data lifecycle are terms that have introduced themselves to the electron microscopy sciences. This is largely attributed to the booming development of new microscopy hardware tools. As a result, large digital image files with an average size of one terabyte within one single acquisition session is not uncommon nowadays, especially in the field of cryogenic electron microscopy. This brings along numerous challenges in data transfer, compute and management. In this review, we will discuss in detail the current state of international knowledge on big data in contemporary electron microscopy and how big data can be transferred, computed and managed efficiently and sustainably. Workflows, solutions, approaches and suggestions will be provided, with the example of the latest experiences in Australia. Finally, important principles such as data integrity, data lifetime and the FAIR and CARE principles will be considered.
Collapse
Affiliation(s)
- David Poger
- Microscopy Australia, The University of Sydney, Sydney, NSW, 2006, Australia.
| | - Lisa Yen
- Microscopy Australia, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Filip Braet
- Australian Centre for Microscopy and Microanalysis, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Medical Sciences (Molecular and Cellular Biomedicine), The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
10
|
Kim HHS, Uddin MR, Xu M, Chang YW. Computational Methods Toward Unbiased Pattern Mining and Structure Determination in Cryo-Electron Tomography Data. J Mol Biol 2023; 435:168068. [PMID: 37003470 PMCID: PMC10164694 DOI: 10.1016/j.jmb.2023.168068] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/19/2023] [Accepted: 03/26/2023] [Indexed: 04/03/2023]
Abstract
Cryo-electron tomography can uniquely probe the native cellular environment for macromolecular structures. Tomograms feature complex data with densities of diverse, densely crowded macromolecular complexes, low signal-to-noise, and artifacts such as the missing wedge effect. Post-processing of this data generally involves isolating regions or particles of interest from tomograms, organizing them into related groups, and rendering final structures through subtomogram averaging. Template-matching and reference-based structure determination are popular analysis methods but are vulnerable to biases and can often require significant user input. Most importantly, these approaches cannot identify novel complexes that reside within the imaged cellular environment. To reliably extract and resolve structures of interest, efficient and unbiased approaches are therefore of great value. This review highlights notable computational software and discusses how they contribute to making automated structural pattern discovery a possibility. Perspectives emphasizing the importance of features for user-friendliness and accessibility are also presented.
Collapse
Affiliation(s)
- Hannah Hyun-Sook Kim
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. https://twitter.com/hannahinthelab
| | - Mostofa Rafid Uddin
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. https://twitter.com/duran_rafid
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
11
|
Zeng X, Kahng A, Xue L, Mahamid J, Chang YW, Xu M. High-throughput cryo-ET structural pattern mining by unsupervised deep iterative subtomogram clustering. Proc Natl Acad Sci U S A 2023; 120:e2213149120. [PMID: 37027429 PMCID: PMC10104553 DOI: 10.1073/pnas.2213149120] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 02/24/2023] [Indexed: 04/08/2023] Open
Abstract
Cryoelectron tomography directly visualizes heterogeneous macromolecular structures in their native and complex cellular environments. However, existing computer-assisted structure sorting approaches are low throughput or inherently limited due to their dependency on available templates and manual labels. Here, we introduce a high-throughput template-and-label-free deep learning approach, Deep Iterative Subtomogram Clustering Approach (DISCA), that automatically detects subsets of homogeneous structures by learning and modeling 3D structural features and their distributions. Evaluation on five experimental cryo-ET datasets shows that an unsupervised deep learning based method can detect diverse structures with a wide range of molecular sizes. This unsupervised detection paves the way for systematic unbiased recognition of macromolecular complexes in situ.
Collapse
Affiliation(s)
- Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| | - Anson Kahng
- Computer Science Department, University of Rochester, Rochester, NY14620
| | - Liang Xue
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg69117, Germany
- Faculty of Biosciences, Collaboration for joint PhD degree between European Molecular Biology Laboratory and Heidelberg University, Heidelberg69117, Germany
| | - Julia Mahamid
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg69117, Germany
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA15213
| |
Collapse
|
12
|
Burley SK, Berman HM, Chiu W, Dai W, Flatt JW, Hudson BP, Kaelber JT, Khare SD, Kulczyk AW, Lawson CL, Pintilie GD, Sali A, Vallat B, Westbrook JD, Young JY, Zardecki C. Electron microscopy holdings of the Protein Data Bank: the impact of the resolution revolution, new validation tools, and implications for the future. Biophys Rev 2022; 14:1281-1301. [PMID: 36474933 PMCID: PMC9715422 DOI: 10.1007/s12551-022-01013-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/06/2022] [Indexed: 12/04/2022] Open
Abstract
As a discipline, structural biology has been transformed by the three-dimensional electron microscopy (3DEM) "Resolution Revolution" made possible by convergence of robust cryo-preservation of vitrified biological materials, sample handling systems, and measurement stages operating a liquid nitrogen temperature, improvements in electron optics that preserve phase information at the atomic level, direct electron detectors (DEDs), high-speed computing with graphics processing units, and rapid advances in data acquisition and processing software. 3DEM structure information (atomic coordinates and related metadata) are archived in the open-access Protein Data Bank (PDB), which currently holds more than 11,000 3DEM structures of proteins and nucleic acids, and their complexes with one another and small-molecule ligands (~ 6% of the archive). Underlying experimental data (3DEM density maps and related metadata) are stored in the Electron Microscopy Data Bank (EMDB), which currently holds more than 21,000 3DEM density maps. After describing the history of the PDB and the Worldwide Protein Data Bank (wwPDB) partnership, which jointly manages both the PDB and EMDB archives, this review examines the origins of the resolution revolution and analyzes its impact on structural biology viewed through the lens of PDB holdings. Six areas of focus exemplifying the impact of 3DEM across the biosciences are discussed in detail (icosahedral viruses, ribosomes, integral membrane proteins, SARS-CoV-2 spike proteins, cryogenic electron tomography, and integrative structure determination combining 3DEM with complementary biophysical measurement techniques), followed by a review of 3DEM structure validation by the wwPDB that underscores the importance of community engagement.
Collapse
Affiliation(s)
- Stephen K. Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901 USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093 USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854 USA
| | - Helen M. Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854 USA
| | - Wah Chiu
- Department of Bioengineering, Stanford University, Stanford, CA USA
- Division of CryoEM and Bioimaging, SSRL, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA USA
| | - Wei Dai
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Department of Cell Biology and Neuroscience, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Justin W. Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Brian P. Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Jason T. Kaelber
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901 USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854 USA
| | - Arkadiusz W. Kulczyk
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ 08901 USA
| | - Catherine L. Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | | | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158 USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901 USA
| | - John D. Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901 USA
| | - Jasmine Y. Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| |
Collapse
|
13
|
Isotropic reconstruction for electron tomography with deep learning. Nat Commun 2022; 13:6482. [PMID: 36309499 PMCID: PMC9617606 DOI: 10.1038/s41467-022-33957-8] [Citation(s) in RCA: 144] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 10/05/2022] [Indexed: 12/25/2022] Open
Abstract
Cryogenic electron tomography (cryoET) allows visualization of cellular structures in situ. However, anisotropic resolution arising from the intrinsic "missing-wedge" problem has presented major challenges in visualization and interpretation of tomograms. Here, we have developed IsoNet, a deep learning-based software package that iteratively reconstructs the missing-wedge information and increases signal-to-noise ratio, using the knowledge learned from raw tomograms. Without the need for sub-tomogram averaging, IsoNet generates tomograms with significantly reduced resolution anisotropy. Applications of IsoNet to three representative types of cryoET data demonstrate greatly improved structural interpretability: resolving lattice defects in immature HIV particles, establishing architecture of the paraflagellar rod in Eukaryotic flagella, and identifying heptagon-containing clathrin cages inside a neuronal synapse of cultured cells. Therefore, by overcoming two fundamental limitations of cryoET, IsoNet enables functional interpretation of cellular tomograms without sub-tomogram averaging. Its application to high-resolution cellular tomograms should also help identify differently oriented complexes of the same kind for sub-tomogram averaging.
Collapse
|
14
|
Hajarolasvadi N, Sunkara V, Khavnekar S, Beck F, Brandt R, Baum D. Volumetric macromolecule identification in cryo-electron tomograms using capsule networks. BMC Bioinformatics 2022; 23:360. [PMID: 36042418 PMCID: PMC9429335 DOI: 10.1186/s12859-022-04901-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 08/23/2022] [Indexed: 11/29/2022] Open
Abstract
Background Despite recent advances in cellular cryo-electron tomography (CET), developing automated tools for macromolecule identification in submolecular resolution remains challenging due to the lack of annotated data and high structural complexities. To date, the extent of the deep learning methods constructed for this problem is limited to conventional Convolutional Neural Networks (CNNs). Identifying macromolecules of different types and sizes is a tedious and time-consuming task. In this paper, we employ a capsule-based architecture to automate the task of macromolecule identification, that we refer to as 3D-UCaps. In particular, the architecture is composed of three components: feature extractor, capsule encoder, and CNN decoder. The feature extractor converts voxel intensities of input sub-tomograms to activities of local features. The encoder is a 3D Capsule Network (CapsNet) that takes local features to generate a low-dimensional representation of the input. Then, a 3D CNN decoder reconstructs the sub-tomograms from the given representation by upsampling. Results We performed binary and multi-class localization and identification tasks on synthetic and experimental data. We observed that the 3D-UNet and the 3D-UCaps had an \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1-$$\end{document}F1-score mostly above 60% and 70%, respectively, on the test data. In both network architectures, we observed degradation of at least 40% in the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1-score when identifying very small particles (PDB entry 3GL1) compared to a large particle (PDB entry 4D8Q). In the multi-class identification task of experimental data, 3D-UCaps had an \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1-score of 91% on the test data in contrast to 64% of the 3D-UNet. The better \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1-score of 3D-UCaps compared to 3D-UNet is obtained by a higher precision score. We speculate this to be due to the capsule network employed in the encoder. To study the effect of the CapsNet-based encoder architecture further, we performed an ablation study and perceived that the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1-score is boosted as network depth is increased which is in contrast to the previously reported results for the 3D-UNet. To present a reproducible work, source code, trained models, data as well as visualization results are made publicly available. Conclusion Quantitative and qualitative results show that 3D-UCaps successfully perform various downstream tasks including identification and localization of macromolecules and can at least compete with CNN architectures for this task. Given that the capsule layers extract both the existence probability and the orientation of the molecules, this architecture has the potential to lead to representations of the data that are better interpretable than those of 3D-UNet. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04901-w.
Collapse
Affiliation(s)
- Noushin Hajarolasvadi
- Department of Visual and Data-Centric Computing, Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany.
| | - Vikram Sunkara
- Department of Visual and Data-Centric Computing, Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany
| | - Sagar Khavnekar
- Department of CryoEM Technology, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Florian Beck
- Department of CryoEM Technology, Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Robert Brandt
- Materials and Structural Analysis, Thermo Fisher Scientific, Takustraße 7, 14195, Berlin, Germany
| | - Daniel Baum
- Department of Visual and Data-Centric Computing, Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany
| |
Collapse
|
15
|
Gupta T, He X, Uddin MR, Zeng X, Zhou A, Zhang J, Freyberg Z, Xu M. Self-supervised learning for macromolecular structure classification based on cryo-electron tomograms. Front Physiol 2022; 13:957484. [PMID: 36111160 PMCID: PMC9468634 DOI: 10.3389/fphys.2022.957484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/02/2022] [Indexed: 11/21/2022] Open
Abstract
Macromolecular structure classification from cryo-electron tomography (cryo-ET) data is important for understanding macro-molecular dynamics. It has a wide range of applications and is essential in enhancing our knowledge of the sub-cellular environment. However, a major limitation has been insufficient labelled cryo-ET data. In this work, we use Contrastive Self-supervised Learning (CSSL) to improve the previous approaches for macromolecular structure classification from cryo-ET data with limited labels. We first pretrain an encoder with unlabelled data using CSSL and then fine-tune the pretrained weights on the downstream classification task. To this end, we design a cryo-ET domain-specific data-augmentation pipeline. The benefit of augmenting cryo-ET datasets is most prominent when the original dataset is limited in size. Overall, extensive experiments performed on real and simulated cryo-ET data in the semi-supervised learning setting demonstrate the effectiveness of our approach in macromolecular labeling and classification.
Collapse
Affiliation(s)
- Tarun Gupta
- Department of Computer Science and Engineering, Indian Institute of Technology, Indore, India
| | - Xuehai He
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, United States
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Andrew Zhou
- Irvington High School, Irvington, NY, United States
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
| | - Zachary Freyberg
- Departments of Psychiatry and Cell Biology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
- *Correspondence: Min Xu,
| |
Collapse
|
16
|
Hao Y, Wan X, Yan R, Liu Z, Li J, Zhang S, Cui X, Zhang F. VP-Detector: A 3D multi-scale dense convolutional neural network for macromolecule localization and classification in cryo-electron tomograms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106871. [PMID: 35584579 DOI: 10.1016/j.cmpb.2022.106871] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/28/2022] [Accepted: 05/09/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Cryo-electron tomography (cryo-ET) with subtomogram averaging (STA) is indispensable when studying macromolecule structures and functions in their native environments. Due to the low signal-to-noise ratio, the missing wedge artifacts in tomographic reconstructions, and multiple macromolecules of varied shapes and sizes, macromolecule localization and classification remain challenging. To tackle this bottleneck problem for structural determination by STA, we design an accurate macromolecule localization and classification method named voxelwise particle detector (VP-Detector). METHODS VP-Detector is a two-stage particle detection method based on a 3D multiscale dense convolutional neural network (3D MSDNet). The proposed network uses 3D hybrid dilated convolution (3D HDC) to avoid the resolution loss caused by scaling operations. Meanwhile, it uses 3D dense connectivity to encourage the reuse of feature maps to reduce trainable parameters. In addition, the weighted focal loss is proposed to focus more attention on difficult samples and rare classes, which relieves the class imbalance caused by multiple particles of various sizes. The performance of VP-Detector is evaluated on both simulated and real-world tomograms, and it shows that VP-Detector outperforms state-of-the-art methods. RESULTS The experiments show that VP-Detector outperforms the state-of-the-art methods on particle localization with an F1-score of 0.951 and a precision of 0.978. In addition, VP-Detector can replace manual particle picking in experiment on the real-world tomograms. Furthermore, it performs well in classifying large-, medium-, and small-weight proteins with accuracies of 1, 0.95, and 0.82, respectively. Finally, ablation studies demonstrate the effectiveness of 3D HDC, 3D dense connectivity, weighted focal loss, and training on small training sets. CONCLUSIONS VP-Detector can achieve high accuracy in particle detection with few trainable parameters and support training on small datasets. It can also relieve the class imbalance caused by multiple particles with various shapes and sizes.
Collapse
Affiliation(s)
- Yu Hao
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Xiaohua Wan
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Rui Yan
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhiyong Liu
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Jintao Li
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Shihua Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qingdao, China.
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
| |
Collapse
|
17
|
Bandyopadhyay H, Deng Z, Ding L, Liu S, Uddin MR, Zeng X, Behpour S, Xu M. Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization. Bioinformatics 2022; 38:977-984. [PMID: 34897387 DOI: 10.1093/bioinformatics/btab794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 10/18/2021] [Accepted: 11/17/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms. RESULTS In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with 'warp' modules to alter the simulated data and help the classifier generalize better on experimental data. We do not use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data. AVAILABILITYAND IMPLEMENTATION https://github.com/xulabs/aitom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hmrishav Bandyopadhyay
- Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata 700032, India
| | - Zihao Deng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Leiting Ding
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sinuo Liu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Mostofa Rafid Uddin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Sima Behpour
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
18
|
Wu X, Li C, Zeng X, Wei H, Deng HW, Zhang J, Xu M. CryoETGAN: Cryo-Electron Tomography Image Synthesis via Unpaired Image Translation. Front Physiol 2022; 13:760404. [PMID: 35370760 PMCID: PMC8970048 DOI: 10.3389/fphys.2022.760404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 01/17/2022] [Indexed: 12/02/2022] Open
Abstract
Cryo-electron tomography (Cryo-ET) has been regarded as a revolution in structural biology and can reveal molecular sociology. Its unprecedented quality enables it to visualize cellular organelles and macromolecular complexes at nanometer resolution with native conformations. Motivated by developments in nanotechnology and machine learning, establishing machine learning approaches such as classification, detection and averaging for Cryo-ET image analysis has inspired broad interest. Yet, deep learning-based methods for biomedical imaging typically require large labeled datasets for good results, which can be a great challenge due to the expense of obtaining and labeling training data. To deal with this problem, we propose a generative model to simulate Cryo-ET images efficiently and reliably: CryoETGAN. This cycle-consistent and Wasserstein generative adversarial network (GAN) is able to generate images with an appearance similar to the original experimental data. Quantitative and visual grading results on generated images are provided to show that the results of our proposed method achieve better performance compared to the previous state-of-the-art simulation methods. Moreover, CryoETGAN is stable to train and capable of generating plausibly diverse image samples.
Collapse
Affiliation(s)
- Xindi Wu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Chengkun Li
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Haocheng Wei
- Department of Electrical & Computer Engineering, University of Toronto, Toronto, ON, Canada
| | - Hong-Wen Deng
- Center for Biomedical Informatics & Genomics, Tulane University, New Orleans, LA, United States
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
19
|
Gao S, Han R, Zeng X, Liu Z, Xu M, Zhang F. Macromolecules Structural Classification With a 3D Dilated Dense Network in Cryo-Electron Tomography. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:209-219. [PMID: 33729943 PMCID: PMC8446108 DOI: 10.1109/tcbb.2021.3065986] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Cryo-electron tomography, combined with subtomogram averaging (STA), can reveal three-dimensional (3D) macromolecule structures in the near-native state from cells and other biological samples. In STA, to get a high-resolution 3D view of macromolecule structures, diverse macromolecules captured by the cellular tomograms need to be accurately classified. However, due to the poor signal-to-noise-ratio (SNR) and severe ray artifacts in the tomogram, it remains a major challenge to classify macromolecules with high accuracy. In this paper, we propose a new convolutional neural network, named 3D-Dilated-DenseNet, to improve the performance of macromolecule classification. In 3D-Dilated-DenseNet, there are two key strategies to guarantee macromolecule classification accuracy: 1) Using dense connections to enhance feature map utilization (corresponding to the baseline 3D-C-DenseNet); 2) Adopting dilated convolution to enrich multi-level information in feature maps. We tested 3D-Dilated-DenseNet and 3D-C-DenseNet both on synthetic data and experimental data. The results show that, on synthetic data, compared with the state-of-the-art method in the SHREC contest (SHREC-CNN), both 3D-C-DenseNet and 3D-Dilated-DenseNet outperform SHREC-CNN. In particular, 3D-Dilated-DenseNet improves 0.393 of F1 metric on tiny-size macromolecules and 0.213 on small-size macromolecules. On experimental data, compared with 3D-C-DenseNet, 3D-Dilated-DenseNet can increase classification performance by 2.1 percent.
Collapse
|
20
|
Moebel E, Martinez-Sanchez A, Lamm L, Righetto RD, Wietrzynski W, Albert S, Larivière D, Fourmentin E, Pfeffer S, Ortiz J, Baumeister W, Peng T, Engel BD, Kervrann C. Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms. Nat Methods 2021; 18:1386-1394. [PMID: 34675434 DOI: 10.1038/s41592-021-01275-4] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 08/18/2021] [Indexed: 11/10/2022]
Abstract
Cryogenic electron tomography (cryo-ET) visualizes the 3D spatial distribution of macromolecules at nanometer resolution inside native cells. However, automated identification of macromolecules inside cellular tomograms is challenged by noise and reconstruction artifacts, as well as the presence of many molecular species in the crowded volumes. Here, we present DeepFinder, a computational procedure that uses artificial neural networks to simultaneously localize multiple classes of macromolecules. Once trained, the inference stage of DeepFinder is faster than template matching and performs better than other competitive deep learning methods at identifying macromolecules of various sizes in both synthetic and experimental datasets. On cellular cryo-ET data, DeepFinder localized membrane-bound and cytosolic ribosomes (roughly 3.2 MDa), ribulose 1,5-bisphosphate carboxylase-oxygenase (roughly 560 kDa soluble complex) and photosystem II (roughly 550 kDa membrane complex) with an accuracy comparable to expert-supervised ground truth annotations. DeepFinder is therefore a promising algorithm for the semiautomated analysis of a wide range of molecular targets in cellular tomograms.
Collapse
Affiliation(s)
- Emmanuel Moebel
- Serpico Project-Team, Centre Inria Rennes-Bretagne Atlantique and CNRS-UMR 144, Inria, CNRS, Institut Curie, PSL Research University, Campus Universitaire de Beaulieu, Rennes Cedex, France
| | - Antonio Martinez-Sanchez
- Department of Computer Science, Faculty of Sciences, University of Oviedo, Oviedo, Spain.,Health Research Institute of Asturias (ISPA), Avenida Hospital Universitario s/n, Oviedo, Spain.,Institute of Neuropathology, Cluster of Excellence 'Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells', University of Göttingen, Göttingen, Germany
| | - Lorenz Lamm
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany.,Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Ricardo D Righetto
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
| | | | | | - Damien Larivière
- Fourmentin-Guilbert Scientific Foundation, Noisy-le-Grand, France
| | - Eric Fourmentin
- Fourmentin-Guilbert Scientific Foundation, Noisy-le-Grand, France
| | - Stefan Pfeffer
- Max Planck Institute of Biochemistry, Martinsried, Germany.,Zentrum für Molekulare Biologie der Universität Heidelberg, Heidelberg, Germany
| | - Julio Ortiz
- Max Planck Institute of Biochemistry, Martinsried, Germany.,Ernst Ruska-Centre, Wilhelm-Johnen-Straße, Jülich, Germany
| | | | - Tingying Peng
- Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
| | - Benjamin D Engel
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany. .,Department of Chemistry, Technical University of Munich, Garching, Germany.
| | - Charles Kervrann
- Serpico Project-Team, Centre Inria Rennes-Bretagne Atlantique and CNRS-UMR 144, Inria, CNRS, Institut Curie, PSL Research University, Campus Universitaire de Beaulieu, Rennes Cedex, France.
| |
Collapse
|
21
|
Zeng Y, Howe G, Yi K, Zeng X, Zhang J, Chang YW, Xu M. UNSUPERVISED DOMAIN ALIGNMENT BASED OPEN SET STRUCTURAL RECOGNITION OF MACROMOLECULES CAPTURED BY CRYO-ELECTRON TOMOGRAPHY. PROCEEDINGS. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING 2021; 2021:106-110. [PMID: 35350462 PMCID: PMC8959888 DOI: 10.1109/icip42928.2021.9506205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cellular cryo-Electron Tomography (cryo-ET) provides three-dimensional views of structural and spatial information of various macromolecules in cells in a near-native state. Subtomogram classification is a key step for recognizing and differentiating these macromolecular structures. In recent years, deep learning methods have been developed for high-throughput subtomogram classification tasks; however, conventional supervised deep learning methods cannot recognize macromolecular structural classes that do not exist in the training data. This imposes a major weakness since most native macromolecular structures in cells are unknown and consequently, cannot be included in the training data. Therefore, open set learning which can recognize unknown macromolecular structures is necessary for boosting the power of automatic subtomogram classification. In this paper, we propose a method called Margin-based Loss for Unsupervised Domain Alignment (MLUDA) for open set recognition problems where only a few categories of interest are shared between cross-domain data. Through extensive experiments, we demonstrate that MLUDA performs well at cross-domain open-set classification on both public datasets and medical imaging datasets. So our method is of practical importance.
Collapse
Affiliation(s)
- Yuchen Zeng
- Computational Biology Department, Carnegie Mellon University, United States
| | - Gregory Howe
- Computational Biology Department, Carnegie Mellon University, United States
| | - Kai Yi
- King Abdullah University of Science and Technology, Saudi Arabia
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, United States
| | - Jing Zhang
- Department of Computer Science, University of California Irvine, United States
| | - Yi-Wei Chang
- Perelman School of Medicine, University of Pennsylvania, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, United States
| |
Collapse
|
22
|
Yu L, Li R, Zeng X, Wang H, Jin J, Ge Y, Jiang R, Xu M. Few shot domain adaptation for in situ macromolecule structural classification in cryoelectron tomograms. Bioinformatics 2021; 37:185-191. [PMID: 32722755 DOI: 10.1093/bioinformatics/btaa671] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Revised: 07/06/2020] [Accepted: 07/20/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Cryoelectron tomography (cryo-ET) visualizes structure and spatial organization of macromolecules and their interactions with other subcellular components inside single cells in the close-to-native state at submolecular resolution. Such information is critical for the accurate understanding of cellular processes. However, subtomogram classification remains one of the major challenges for the systematic recognition and recovery of the macromolecule structures in cryo-ET because of imaging limits and data quantity. Recently, deep learning has significantly improved the throughput and accuracy of large-scale subtomogram classification. However, often it is difficult to get enough high-quality annotated subtomogram data for supervised training due to the enormous expense of labeling. To tackle this problem, it is beneficial to utilize another already annotated dataset to assist the training process. However, due to the discrepancy of image intensity distribution between source domain and target domain, the model trained on subtomograms in source domain may perform poorly in predicting subtomogram classes in the target domain. RESULTS In this article, we adapt a few shot domain adaptation method for deep learning-based cross-domain subtomogram classification. The essential idea of our method consists of two parts: (i) take full advantage of the distribution of plentiful unlabeled target domain data, and (ii) exploit the correlation between the whole source domain dataset and few labeled target domain data. Experiments conducted on simulated and real datasets show that our method achieves significant improvement on cross domain subtomogram classification compared with baseline methods. AVAILABILITY AND IMPLEMENTATION Software is available online https://github.com/xulabs/aitom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liangyong Yu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ran Li
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Hongyi Wang
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
| | - Jie Jin
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing 100190, China
| | - Yang Ge
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Beijing 100190, China
| | - Rui Jiang
- Department of Automation, Tsinghua University, Beijing 100084, China
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
23
|
Singla J, White KL, Stevens RC, Alber F. Assessment of scoring functions to rank the quality of 3D subtomogram clusters from cryo-electron tomography. J Struct Biol 2021; 213:107727. [PMID: 33753204 DOI: 10.1016/j.jsb.2021.107727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 03/12/2021] [Accepted: 03/17/2021] [Indexed: 11/17/2022]
Abstract
Cryo-electron tomography provides the opportunity for unsupervised discovery of endogenous complexes in situ. This process usually requires particle picking, clustering and alignment of subtomograms to produce an average structure of the complex. When applied to heterogeneous samples, template-free clustering and alignment of subtomograms can potentially lead to the discovery of structures for unknown endogenous complexes. However, such methods require scoring functions to measure and accurately rank the quality of aligned subtomogram clusters, which can be compromised by contaminations from misclassified complexes and alignment errors. Here, we provide the first study to assess the effectiveness of more than 15 scoring functions for evaluating the quality of subtomogram clusters, which differ in the amount of structural misalignments and contaminations due to misclassified complexes. We assessed both experimental and simulated subtomograms as ground truth data sets. Our analysis showed that the robustness of scoring functions varies largely. Most scores were sensitive to the signal-to-noise ratio of subtomograms and often required Gaussian filtering as preprocessing for improved performance. Two scoring functions, Spectral SNR-based Fourier Shell Correlation and Pearson Correlation in the Fourier domain with missing wedge correction, showed a robust ranking of subtomogram clusters without any preprocessing and irrespective of SNR levels of subtomograms. Of these two scoring functions, Spectral SNR-based Fourier Shell Correlation was fastest to compute and is a better choice for handling large numbers of subtomograms. Our results provide a guidance for choosing an accurate scoring function for template-free approaches to detect complexes from heterogeneous samples.
Collapse
Affiliation(s)
- Jitin Singla
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, 520 Boyer Hall, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA; Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Kate L White
- Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Raymond C Stevens
- Department of Biological Sciences, Bridge Institute, Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology, and Molecular Genetics, University of California Los Angeles, 520 Boyer Hall, Los Angeles, CA 90095, USA; Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, USA.
| |
Collapse
|
24
|
Nikishin I, Dulimov R, Skryabin G, Galetsky S, Tchevkina E, Bagrov D. ScanEV - A neural network-based tool for the automated detection of extracellular vesicles in TEM images. Micron 2021; 145:103044. [PMID: 33676158 DOI: 10.1016/j.micron.2021.103044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 02/24/2021] [Accepted: 02/24/2021] [Indexed: 12/18/2022]
Abstract
Transmission electron microscopy (TEM) is the most widely accepted method for visualization of extracellular vesicles (EVs), and particularly, exosomes. TEM images provide us with information about the size and morphology of the EVs. We have developed an online tool ScanEV (Scanner for the Extracellular Vesicles, available at https://bioeng.ru/scanev), for the rapid and automated processing of such images. ScanEV is based on a convolutional neural network; it detects the «cup-shaped» particles in the images and calculates their morphometric parameters. This tool will be useful for researchers who study EVs and use TEM for their characterization.
Collapse
Affiliation(s)
- Igor Nikishin
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | | | - Gleb Skryabin
- N.N. Blokhin National Medical Research Center of Oncology, Moscow, Russia
| | - Sergey Galetsky
- N.N. Blokhin National Medical Research Center of Oncology, Moscow, Russia
| | - Elena Tchevkina
- N.N. Blokhin National Medical Research Center of Oncology, Moscow, Russia
| | - Dmitry Bagrov
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia.
| |
Collapse
|
25
|
Du X, Wang H, Zhu Z, Zeng X, Chang YW, Zhang J, Xing E, Xu M. Active Learning to Classify Macromolecular Structures in situ for Less Supervision in Cryo-Electron Tomography. Bioinformatics 2021; 37:2340-2346. [PMID: 33620460 DOI: 10.1093/bioinformatics/btab123] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 01/14/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cryo-Electron Tomography (cryo-ET) is a 3D bioimaging tool that visualizes the structural and spatial organization of macromolecules at a near-native state in single cells, which has broad applications in life science. However, the systematic structural recognition and recovery of macromolecules captured by cryo-ET are difficult due to high structural complexity and imaging limits. Deep learning based subtomogram classification have played critical roles for such tasks. As supervised approaches, however, their performance relies on sufficient and laborious annotation on a large training dataset. RESULTS To alleviate this major labeling burden, we proposed a Hybrid Active Learning (HAL) framework for querying subtomograms for labelling from a large unlabeled subtomogram pool. Firstly, HAL adopts uncertainty sampling to select the subtomograms that have the most uncertain predictions. This strategy enforces the model to be aware of the inductive bias during classification and subtomogram selection, which satisfies the discriminativeness principle in AL literature. Moreover, to mitigate the sampling bias caused by such strategy, a discriminator is introduced to judge if a certain subtomogram is labeled or unlabeled and subsequently the model queries the subtomogram that have higher probabilities to be unlabeled. Such query strategy encourages to match the data distribution between the labeled and unlabeled subtomogram samples, which essentially encodes the representativeness criterion into the subtomogram selection process. Additionally, HAL introduces a subset sampling strategy to improve the diversity of the query set, so that the information overlap is decreased between the queried batches and the algorithmic efficiency is improved. Our experiments on subtomogram classification tasks using both simulated and real data demonstrate that we can achieve comparable testing performance (on average only 3% accuracy drop) by using less than 30% of the labeled subtomograms, which shows a very promising result for subtomogram classification task with limited labeling resources. AVAILABILITY https://github.com/xulabs/aitom.
Collapse
Affiliation(s)
- Xuefeng Du
- Department of Computer Science, University of Wisconsin-Madison, Madison, 53706, USA
| | - Haohan Wang
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Zhenxi Zhu
- Department of Computer Science, Beijing University of Posts and Telecommunications, 100876, China
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, 19104, USA
| | - Jing Zhang
- Department of Computer Science, University of California - Irvine, Irvine, 92697, USA
| | - Eric Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, 15213, USA
| |
Collapse
|
26
|
Zhou B, Yu H, Zeng X, Yang X, Zhang J, Xu M. One-Shot Learning With Attention-Guided Segmentation in Cryo-Electron Tomography. Front Mol Biosci 2021; 7:613347. [PMID: 33511158 PMCID: PMC7835881 DOI: 10.3389/fmolb.2020.613347] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 12/09/2020] [Indexed: 11/13/2022] Open
Abstract
Cryo-electron Tomography (cryo-ET) generates 3D visualization of cellular organization that allows biologists to analyze cellular structures in a near-native state with nano resolution. Recently, deep learning methods have demonstrated promising performance in classification and segmentation of macromolecule structures captured by cryo-ET, but training individual deep learning models requires large amounts of manually labeled and segmented data from previously observed classes. To perform classification and segmentation in the wild (i.e., with limited training data and with unseen classes), novel deep learning model needs to be developed to classify and segment unseen macromolecules captured by cryo-ET. In this paper, we develop a one-shot learning framework, called cryo-ET one-shot network (COS-Net), for simultaneous classification of macromolecular structure and generation of the voxel-level 3D segmentation, using only one training sample per class. Our experimental results on 22 macromolecule classes demonstrated that our COS-Net could efficiently classify macromolecular structures with small amounts of samples and produce accurate 3D segmentation at the same time.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Biomedical Engineering, Yale University, New Haven, CT, United States
| | - Haisu Yu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Xiaoyan Yang
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Jing Zhang
- Computer Science Department, University of California, Irvine, Irvine, CA, United States
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
27
|
Few-shot learning for classification of novel macromolecular structures in cryo-electron tomograms. PLoS Comput Biol 2020; 16:e1008227. [PMID: 33175839 PMCID: PMC7682871 DOI: 10.1371/journal.pcbi.1008227] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 11/23/2020] [Accepted: 08/08/2020] [Indexed: 01/25/2023] Open
Abstract
Cryo-electron tomography (cryo-ET) provides 3D visualization of subcellular components in the near-native state and at sub-molecular resolutions in single cells, demonstrating an increasingly important role in structural biology in situ. However, systematic recognition and recovery of macromolecular structures in cryo-ET data remain challenging as a result of low signal-to-noise ratio (SNR), small sizes of macromolecules, and high complexity of the cellular environment. Subtomogram structural classification is an essential step for such task. Although acquisition of large amounts of subtomograms is no longer an obstacle due to advances in automation of data collection, obtaining the same number of structural labels is both computation and labor intensive. On the other hand, existing deep learning based supervised classification approaches are highly demanding on labeled data and have limited ability to learn about new structures rapidly from data containing very few labels of such new structures. In this work, we propose a novel approach for subtomogram classification based on few-shot learning. With our approach, classification of unseen structures in the training data can be conducted given few labeled samples in test data through instance embedding. Experiments were performed on both simulated and real datasets. Our experimental results show that we can make inference on new structures given only five labeled samples for each class with a competitive accuracy (> 0.86 on the simulated dataset with SNR = 0.1), or even one sample with an accuracy of 0.7644. The results on real datasets are also promising with accuracy > 0.9 on both conditions and even up to 1 on one of the real datasets. Our approach achieves significant improvement compared with the baseline method and has strong capabilities of generalizing to other cellular components.
Collapse
|
28
|
Lin R, Zeng X, Kitani K, Xu M. Adversarial domain adaptation for cross data source macromolecule in situ structural classification in cellular electron cryo-tomograms. Bioinformatics 2019; 35:i260-i268. [PMID: 31510673 PMCID: PMC6612867 DOI: 10.1093/bioinformatics/btz364] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Since 2017, an increasing amount of attention has been paid to the supervised deep learning-based macromolecule in situ structural classification (i.e. subtomogram classification) in cellular electron cryo-tomography (CECT) due to the substantially higher scalability of deep learning. However, the success of such supervised approach relies heavily on the availability of large amounts of labeled training data. For CECT, creating valid training data from the same data source as prediction data is usually laborious and computationally intensive. It would be beneficial to have training data from a separate data source where the annotation is readily available or can be performed in a high-throughput fashion. However, the cross data source prediction is often biased due to the different image intensity distributions (a.k.a. domain shift). RESULTS We adapt a deep learning-based adversarial domain adaptation (3D-ADA) method to timely address the domain shift problem in CECT data analysis. 3D-ADA first uses a source domain feature extractor to extract discriminative features from the training data as the input to a classifier. Then it adversarially trains a target domain feature extractor to reduce the distribution differences of the extracted features between training and prediction data. As a result, the same classifier can be directly applied to the prediction data. We tested 3D-ADA on both experimental and realistically simulated subtomogram datasets under different imaging conditions. 3D-ADA stably improved the cross data source prediction, as well as outperformed two popular domain adaptation methods. Furthermore, we demonstrate that 3D-ADA can improve cross data source recovery of novel macromolecular structures. AVAILABILITY AND IMPLEMENTATION https://github.com/xulabs/projects. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruogu Lin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiangrui Zeng
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kris Kitani
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Min Xu
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
29
|
Liu C, Zeng X, Wang KW, Guo Q, Xu M. Multi-task Learning for Macromolecule Classification, Segmentation and Coarse Structural Recovery in Cryo-Tomography. BMVC : PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE. BRITISH MACHINE VISION CONFERENCE 2018; 2018:1007. [PMID: 36951799 PMCID: PMC10028434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Cellular Electron Cryo-Tomography (CECT) is a powerful 3D imaging tool for studying the native structure and organization of macromolecules inside single cells. For systematic recognition and recovery of macromolecular structures captured by CECT, methods for several important tasks such as subtomogram classification and semantic segmentation have been developed. However, the recognition and recovery of macromolecular structures are still very difficult due to high molecular structural diversity, crowding molecular environment, and the imaging limitations of CECT. In this paper, we propose a novel multi-task 3D convolutional neural network model for simultaneous classification, segmentation, and coarse structural recovery of macromolecules of interest in subtomograms. In our model, the learned image features of one task are shared and thereby mutually reinforce the learning of other tasks. Evaluated on realistically simulated and experimental CECT data, our multi-task learning model outperformed all single-task learning methods for classification and segmentation. In addition, we demonstrate that our model can generalize to discover, segment and recover novel structures that do not exist in the training data.
Collapse
Affiliation(s)
- Chang Liu
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| | - Xiangrui Zeng
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| | - Kai Wen Wang
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| | - Qiang Guo
- Max Planck Institute for Biochemistry Martinsried, Germany
| | - Min Xu
- School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA
| |
Collapse
|
30
|
Guo J, Zhou B, Zeng X, Freyberg Z, Xu M. Model Compression for Faster Structural Separation of Macromolecules Captured by Cellular Electron Cryo-Tomography. IMAGE ANALYSIS AND RECOGNITION: INTERNATIONAL CONFERENCE, ICIAR ... : PROCEEDINGS. ICIAR 2018; 10882:144-152. [PMID: 31231722 PMCID: PMC6588193 DOI: 10.1007/978-3-319-93000-8_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Electron Cryo-Tomography (ECT) enables 3D visualization of macromolecule structure inside single cells. Macromolecule classification approaches based on convolutional neural networks (CNN) were developed to separate millions of macromolecules captured from ECT systematically. However, given the fast accumulation of ECT data, it will soon become necessary to use CNN models to efficiently and accurately separate substantially more macromolecules at the prediction stage, which requires additional computational costs. To speed up the prediction, we compress classification models into compact neural networks with little in accuracy for deployment. Specifically, we propose to perform model compression through knowledge distillation. Firstly, a complex teacher network is trained to generate soft labels with better classification feasibility followed by training of customized student networks with simple architectures using the soft label to compress model complexity. Our tests demonstrate that our compressed models significantly reduce the number of parameters and time cost while maintaining similar classification accuracy.
Collapse
Affiliation(s)
| | - Bo Zhou
- School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Xiangrui Zeng
- School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| | - Zachary Freyberg
- Departments of Psychiatry and Cell Biology, University of Pittsburgh, Pittsburgh, USA
| | - Min Xu
- School of Computer Science, Carnegie Mellon University, Pittsburgh, USA
| |
Collapse
|