1
|
Qi H, Zhao H, Li E, Lu X, Yu N, Liu J, Han J. DeepQA: A Unified Transcriptome-Based Aging Clock Using Deep Neural Networks. Aging Cell 2025; 24:e14471. [PMID: 39757434 PMCID: PMC12074024 DOI: 10.1111/acel.14471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 11/21/2024] [Accepted: 12/17/2024] [Indexed: 01/07/2025] Open
Abstract
Understanding the complex biological process of aging is of great value, especially as it can help develop therapeutics to prolong healthy life. Predicting biological age from gene expression data has shown to be an effective means to quantify aging of a subject, and to identify molecular and cellular biomarkers of aging. A typical approach for estimating biological age, adopted by almost all existing aging clocks, is to train machine learning models only on healthy subjects, but to infer on both healthy and unhealthy subjects. However, the inherent bias in this approach results in inaccurate biological age as shown in this study. Moreover, almost all existing transcriptome-based aging clocks were built around an inefficient procedure of gene selection followed by conventional machine learning models such as elastic nets, linear discriminant analysis etc. To address these limitations, we proposed DeepQA, a unified aging clock based on mixture of experts. Unlike existing methods, DeepQA is equipped with a specially designed Hinge-Mean-Absolute-Error (Hinge-MAE) loss so that it can train on both healthy and unhealthy subjects of multiple cohorts to reduce the bias of inferring biological age of unhealthy subjects. Our experiments showed that DeepQA significantly outperformed existing methods for biological age estimation on both healthy and unhealthy subjects. In addition, our method avoids the inefficient exhaustive search of genes, and provides a novel means to identify genes activated in aging prediction, alternative to such as differential gene expression analysis.
Collapse
Affiliation(s)
- Hongqian Qi
- State Key Laboratory of Medicinal Chemical BiologyNankai UniversityTianjinChina
- College of PharmacyNankai UniversityTianjinChina
| | - Hongchen Zhao
- College of Artificial IntelligenceNankai UniversityTianjinChina
| | - Enyi Li
- College of Artificial IntelligenceNankai UniversityTianjinChina
| | - Xinyi Lu
- State Key Laboratory of Medicinal Chemical BiologyNankai UniversityTianjinChina
| | - Ningbo Yu
- College of Artificial IntelligenceNankai UniversityTianjinChina
- Engineering Research Center of Trusted Behavior Intelligence, Ministry of EducationNankai UniversityChina
| | - Jinchao Liu
- College of Artificial IntelligenceNankai UniversityTianjinChina
- Engineering Research Center of Trusted Behavior Intelligence, Ministry of EducationNankai UniversityChina
| | - Jianda Han
- College of Artificial IntelligenceNankai UniversityTianjinChina
- Engineering Research Center of Trusted Behavior Intelligence, Ministry of EducationNankai UniversityChina
| |
Collapse
|
2
|
Zou Z, Liu Y, Bai Y, Luo J, Zhang Z. scTrans: Sparse attention powers fast and accurate cell type annotation in single-cell RNA-seq data. PLoS Comput Biol 2025; 21:e1012904. [PMID: 40184563 PMCID: PMC11970913 DOI: 10.1371/journal.pcbi.1012904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 02/24/2025] [Indexed: 04/06/2025] Open
Abstract
Cell type annotation is crucial in single-cell RNA sequencing data analysis because it enables significant biological discoveries and deepens our understanding of tissue biology. Given the high-dimensional and highly sparse nature of single-cell RNA sequencing data, most existing annotation tools focus on highly variable genes to reduce dimensionality and computational load. However, this approach inevitably results in information loss, potentially weakening the model's generalization performance and adaptability to novel datasets. To mitigate this issue, we developed scTrans, a single cell Transformer-based model, which employs sparse attention to utilize all non-zero genes, thereby effectively reducing the input data dimensionality while minimizing information loss. We validated the speed and accuracy of scTrans by performing cell type annotation on 31 different tissues within the Mouse Cell Atlas. Remarkably, even with datasets nearing a million cells, scTrans efficiently perform cell type annotation in limited computational resources. Furthermore, scTrans demonstrates strong generalization capabilities, accurately annotating cells in novel datasets and generating high-quality latent representations, which are essential for precise clustering and trajectory analysis.
Collapse
Affiliation(s)
- Zhiyi Zou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Ying Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Yuting Bai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, China
| | - Zhaolei Zhang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Andrade AX, Nguyen S, Montillo A. scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder. RESEARCH SQUARE 2025:rs.3.rs-6081478. [PMID: 40166015 PMCID: PMC11957221 DOI: 10.21203/rs.3.rs-6081478/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for s ingle- c ell M ixed E ffects D eep A utoencoder L earning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.
Collapse
Affiliation(s)
- Aixa X. Andrade
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Son Nguyen
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Albert Montillo
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
4
|
Andrade AX, Nguyen S, Montillo A. scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder. ARXIV 2025:arXiv:2411.06635v3. [PMID: 39606715 PMCID: PMC11601787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for single-cell Mixed Effects Deep Autoencoder Learning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.
Collapse
|
5
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2025; 68:5-102. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
6
|
Wang Y, Li K, Zhang R, Fan Y, Huang L, Zhou F. GraCEImpute: A novel graph clustering autoencoder approach for imputation of single-cell RNA-seq data. Comput Biol Med 2025; 184:109400. [PMID: 39561511 DOI: 10.1016/j.compbiomed.2024.109400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 10/14/2024] [Accepted: 11/07/2024] [Indexed: 11/21/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) technology establishes a unique view for elucidating cellular heterogeneity in various biological systems. Yet the scRNA-seq data is compromised by a high dropout rate due to the technological limitation, and the substantial data loss poses computational challenges on subsequent analyses. This study introduces a novel graph clustering autoencoder (GCAE)-based imputation approach (GraCEImpute) to address the challenge of missing data in scRNA-seq data. Our comprehensive evaluation demonstrates that the GraCEImpute model outperforms existing approaches in accurately imputing dropout zeros within scRNA-seq data. The proposed GraCEImpute model also demonstrates the significantly enhanced quality of downstream scRNA-seq data analyses, including clustering, differential gene expression (DEG) analysis, and cell trajectory inference. These improvements underscore the GraCEImpute model's potential to facilitate a deeper understanding of cellular processes and heterogeneity through the scRNA-seq data analyses. The source code is released at https://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Yueying Wang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yusi Fan
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.
| | - Lan Huang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; School of Biology and Engineering, Guizhou Medical University, Guiyang, 550025, Guizhou, China.
| |
Collapse
|
7
|
Cao S, Wei Y, Yue Y, Wang D, Xiong A, Yang J, Zeng H. Research Trends and Dynamics in Single-cell RNA Sequencing for Musculoskeletal Diseases: A Scientometric and Visualization Study. Int J Med Sci 2025; 22:528-550. [PMID: 39898252 PMCID: PMC11783068 DOI: 10.7150/ijms.104697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 12/11/2024] [Indexed: 02/04/2025] Open
Abstract
Background: Worldwide, approximately 1.7 billion people are afflicted with musculoskeletal (MSK) diseases, posing significant health challenges. The introduction of single-cell RNA sequencing (scRNA-seq) technology provides novel insights and approaches to comprehend the onset, progression, and treatment of MSK diseases. Nevertheless, there is a remarkable lack of analytical and descriptive studies regarding the trajectory, essential research directions, current research situation, pivotal research focuses, and upcoming perspectives. Therefore, the aim of this research is to present a comprehensive overview of the advancements made in scRNA-seq for MSK disorders over the past 15 years. Methods: It utilizes a robust dataset derived from the Web of Science Core Collection, encompassing January 1, 2009, through September 6, 2024. To achieve this, advanced analytical methodologies were applied to conduct thorough scientometric and visual analyses. Results: The findings underscore the preeminent role of China, which contributes 63.49% of the total publications, thereby exerting a substantial impact within this research domain. Notable contributions came from institutions such as Shanghai Jiao Tong University, Sun Yat-sen University, and Harvard Medical School, with Liu Yun being the leading contributor. Frontiers in Immunology published the greatest number of research papers in this field. This study identified joint diseases, bone neoplasms, bone fractures, and intervertebral disc degeneration as the main research focuses. Conclusion: This extensive scientometric analysis provides substantial benefits to both experienced and novice researchers by facilitating immediate access to critical data, thereby fostering innovation within this field.
Collapse
Affiliation(s)
- Siyang Cao
- National & Local Joint Engineering Research Centre of Orthopaedic Biomaterials, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Shenzhen Key Laboratory of Orthopaedic Diseases and Biomaterials Research, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Department of Bone & Joint Surgery, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
| | - Yihao Wei
- National & Local Joint Engineering Research Centre of Orthopaedic Biomaterials, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Shenzhen Key Laboratory of Orthopaedic Diseases and Biomaterials Research, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Department of Rehabilitation Science, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region, People's Republic of China
- Faculty of Pharmaceutical Sciences, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences (CAS), Shenzhen, Guangdong, People's Republic of China
- Faculty of Pharmaceutical Sciences, Shenzhen University of Advanced Technology, Shenzhen, Guangdong, People's Republic of China
| | - Yaohang Yue
- National & Local Joint Engineering Research Centre of Orthopaedic Biomaterials, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Shenzhen Key Laboratory of Orthopaedic Diseases and Biomaterials Research, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Department of Bone & Joint Surgery, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
| | - Deli Wang
- National & Local Joint Engineering Research Centre of Orthopaedic Biomaterials, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Shenzhen Key Laboratory of Orthopaedic Diseases and Biomaterials Research, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Department of Bone & Joint Surgery, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
| | - Ao Xiong
- National & Local Joint Engineering Research Centre of Orthopaedic Biomaterials, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Shenzhen Key Laboratory of Orthopaedic Diseases and Biomaterials Research, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Department of Bone & Joint Surgery, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
| | - Jun Yang
- Department of Radiology, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
| | - Hui Zeng
- National & Local Joint Engineering Research Centre of Orthopaedic Biomaterials, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Shenzhen Key Laboratory of Orthopaedic Diseases and Biomaterials Research, Peking University Shenzhen Hospital, Shenzhen, Guangdong, People's Republic of China
- Department of Orthopedics, Shenzhen Second People's Hospital, The First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong, People's Republic of China
| |
Collapse
|
8
|
Zhu Q, Li A, Zhang Z, Zheng C, Zhao J, Liu JX, Zhang D, Shao W. Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2543-2555. [PMID: 39471116 DOI: 10.1109/tcbb.2024.3487574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2024]
Abstract
Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.
Collapse
|
9
|
Yu Y, Mai Y, Zheng Y, Shi L. Assessing and mitigating batch effects in large-scale omics studies. Genome Biol 2024; 25:254. [PMID: 39363244 PMCID: PMC11447944 DOI: 10.1186/s13059-024-03401-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/23/2024] [Indexed: 10/05/2024] Open
Abstract
Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Collapse
Affiliation(s)
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Yuanbang Mai
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China.
- Cancer Institute, Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| |
Collapse
|
10
|
Yu Z, Liu F, Li Y. scTCA: a hybrid Transformer-CNN architecture for imputation and denoising of scDNA-seq data. Brief Bioinform 2024; 25:bbae577. [PMID: 39523623 PMCID: PMC11551055 DOI: 10.1093/bib/bbae577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 10/05/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Single-cell DNA sequencing (scDNA-seq) has been widely used to unmask tumor copy number alterations (CNAs) at single-cell resolution. Despite that arm-level CNAs can be accurately detected from single-cell read counts, it is difficult to precisely identify focal CNAs as the read counts are featured with high dimensionality, high sparsity and low signal-to-noise ratio. This gives rise to a desperate demand for reconstructing high-quality scDNA-seq data. We develop a new method called scTCA for imputation and denoising of single-cell read counts, thus aiding in downstream analysis of both arm-level and focal CNAs. scTCA employs hybrid Transformer-CNN architectures to identify local and non-local correlations between genes for precise recovery of the read counts. Unlike conventional Transformers, the Transformer block in scTCA is a two-stage attention module containing a stepwise self-attention layer and a window Transformer, and can efficiently deal with the high-dimensional read counts data. We showcase the superior performance of scTCA through comparison with the state-of-the-arts on both synthetic and real datasets. The results indicate it is highly effective in imputation and denoising of scDNA-seq data.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, 750021 Ningxia, China
- Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West, Ningxia University, 750021 Ningxia, China
| | - Furui Liu
- School of Information Engineering, Ningxia University, 750021 Ningxia, China
| | - Yang Li
- School of Information Engineering, Ningxia University, 750021 Ningxia, China
| |
Collapse
|
11
|
Xu L, Li Z, Ren J, Liu S, Xu Y. Single-cell RNA sequencing data analysis utilizing multi-type graph neural networks. Comput Biol Med 2024; 179:108921. [PMID: 39059210 DOI: 10.1016/j.compbiomed.2024.108921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 07/08/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is the sequencing technology of a single cell whose expression reflects the overall characteristics of the individual cell, facilitating the research of problems at the cellular level. However, the problems of scRNA-seq such as dimensionality reduction processing of massive data, technical noise in data, and visualization of single-cell type clustering cause great difficulties for analyzing and processing scRNA-seq data. In this paper, we propose a new single-cell data analysis model using denoising autoencoder and multi-type graph neural networks (scDMG), which learns cell-cell topology information and latent representation of scRNA-seq data. scDMG introduces the zero-inflated negative binomial (ZINB) model into a denoising autoencoder (DAE) to perform dimensionality reduction and denoising on the raw data. scDMG integrates multiple-type graph neural networks as the encoder to further train the preprocessed data, which better deals with various types of scRNA-seq datasets, resolves dropout events in scRNA-seq data, and enables preliminary classification of scRNA-seq data. By employing TSNE and PCA algorithms for the trained data and invoking Louvain algorithm, scDMG has better dimensionality reduction and clustering optimization. Compared with other mainstream scRNA-seq clustering algorithms, scDMG outperforms other state-of-the-art methods in various clustering performance metrics and shows better scalability, shorter runtime, and great clustering results.
Collapse
Affiliation(s)
- Li Xu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Zhenpeng Li
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
| | - Jiaxu Ren
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Shuaipeng Liu
- College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, Heilongjiang, China
| | - Yiming Xu
- College of Engineering, Tokyo Institute of Technology, Tokyo, 226-0026, Tokyo, Japan
| |
Collapse
|
12
|
Qian Y, Zou Q, Zhao M, Liu Y, Guo F, Ding Y. scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization. PLoS Comput Biol 2024; 20:e1012339. [PMID: 39116191 PMCID: PMC11338450 DOI: 10.1371/journal.pcbi.1012339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 08/21/2024] [Accepted: 07/19/2024] [Indexed: 08/10/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool in genomics research, enabling the analysis of gene expression at the individual cell level. However, scRNA-seq data often suffer from a high rate of dropouts, where certain genes fail to be detected in specific cells due to technical limitations. This missing data can introduce biases and hinder downstream analysis. To overcome this challenge, the development of effective imputation methods has become crucial in the field of scRNA-seq data analysis. Here, we propose an imputation method based on robust and non-negative matrix factorization (scRNMF). Instead of other matrix factorization algorithms, scRNMF integrates two loss functions: L2 loss and C-loss. The L2 loss function is highly sensitive to outliers, which can introduce substantial errors. We utilize the C-loss function when dealing with zero values in the raw data. The primary advantage of the C-loss function is that it imposes a smaller punishment for larger errors, which results in more robust factorization when handling outliers. Various datasets of different sizes and zero rates are used to evaluate the performance of scRNMF against other state-of-the-art methods. Our method demonstrates its power and stability as a tool for imputation of scRNA-seq data.
Collapse
Affiliation(s)
- Yuqing Qian
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mengyuan Zhao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yi Liu
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
13
|
Liu W, Pan Y, Teng Z, Xu J. scDMAE: A Generative Denoising Model Adopted Mask Strategy for scRNA-Seq Data Recovery. IEEE J Biomed Health Inform 2024; 28:3772-3780. [PMID: 38568766 DOI: 10.1109/jbhi.2024.3383921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized gene expression studies at the single-cell level. However, the presence of technical noise and data sparsity in scRNA-seq often undermines the accuracy of subsequent analyses. Existing methods for denoising and imputing scRNA-seq data often rely on stringent assumptions about data distribution, limiting the effectiveness of data recovery. In this study, we propose the scDMAE model for denoising and recovery of scRNA-seq data. First, the model fuses gene expression features and topological features to discern the primary expression patterns of genes in cells. Then, an autoencoder with a masking strategy is used to model dropout events and separate potential noise in the data. Finally, the model incorporates the original raw data to recover the true biological expression value. By conducting experiments on various types of scRNA-Seq datasets, scDMAE demonstrates superior performance compared to other comparative methods based on six distinct evaluation metrics in downstream analysis. The scDMAE method can accurately cluster similar cell populations, identify differential genes and infer cell trajectories.
Collapse
|
14
|
Zhang T, Ren J, Li L, Wu Z, Zhang Z, Dong G, Wang G. scZAG: Integrating ZINB-Based Autoencoder with Adaptive Data Augmentation Graph Contrastive Learning for scRNA-seq Clustering. Int J Mol Sci 2024; 25:5976. [PMID: 38892162 PMCID: PMC11172799 DOI: 10.3390/ijms25115976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 04/08/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to interpret cellular states, detect cell subpopulations, and study disease mechanisms. In scRNA-seq data analysis, cell clustering is a key step that can identify cell types. However, scRNA-seq data are characterized by high dimensionality and significant sparsity, presenting considerable challenges for clustering. In the high-dimensional gene expression space, cells may form complex topological structures. Many conventional scRNA-seq data analysis methods focus on identifying cell subgroups rather than exploring these potential high-dimensional structures in detail. Although some methods have begun to consider the topological structures within the data, many still overlook the continuity and complex topology present in single-cell data. We propose a deep learning framework that begins by employing a zero-inflated negative binomial (ZINB) model to denoise the highly sparse and over-dispersed scRNA-seq data. Next, scZAG uses an adaptive graph contrastive representation learning approach that combines approximate personalized propagation of neural predictions graph convolution (APPNPGCN) with graph contrastive learning methods. By using APPNPGCN as the encoder for graph contrastive learning, we ensure that each cell's representation reflects not only its own features but also its position in the graph and its relationships with other cells. Graph contrastive learning exploits the relationships between nodes to capture the similarity among cells, better representing the data's underlying continuity and complex topology. Finally, the learned low-dimensional latent representations are clustered using Kullback-Leibler divergence. We validated the superior clustering performance of scZAG on 10 common scRNA-seq datasets in comparison to existing state-of-the-art clustering methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (T.Z.); (J.R.); (L.L.); (Z.W.); (Z.Z.); (G.D.)
| |
Collapse
|
15
|
Li W, Yang F, Wang F, Rong Y, Liu L, Wu B, Zhang H, Yao J. scPROTEIN: a versatile deep graph contrastive learning framework for single-cell proteomics embedding. Nat Methods 2024; 21:623-634. [PMID: 38504113 DOI: 10.1038/s41592-024-02214-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 02/16/2024] [Indexed: 03/21/2024]
Abstract
Single-cell proteomics sequencing technology sheds light on protein-protein interactions, posttranslational modifications and proteoform dynamics in the cell. However, the uncertainty estimation for peptide quantification, data missingness, batch effects and high noise hinder the analysis of single-cell proteomic data. It is important to solve this set of tangled problems together, but the existing methods tailored for single-cell transcriptomes cannot fully address this task. Here we propose a versatile framework designed for single-cell proteomics data analysis called scPROTEIN, which consists of peptide uncertainty estimation based on a multitask heteroscedastic regression model and cell embedding generation based on graph contrastive learning. scPROTEIN can estimate the uncertainty of peptide quantification, denoise protein data, remove batch effects and encode single-cell proteomic-specific embeddings in a unified framework. We demonstrate that scPROTEIN is efficient for cell clustering, batch correction, cell type annotation, clinical analysis and spatially resolved proteomic data exploration.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tianjin, China
- AI Lab, Tencent, Shenzhen, China
| | - Fan Yang
- AI Lab, Tencent, Shenzhen, China
| | | | - Yu Rong
- AI Lab, Tencent, Shenzhen, China
| | | | | | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tianjin, China.
| | | |
Collapse
|
16
|
Danino R, Nachman I, Sharan R. Batch correction of single-cell sequencing data via an autoencoder architecture. BIOINFORMATICS ADVANCES 2023; 4:vbad186. [PMID: 38213820 PMCID: PMC10781938 DOI: 10.1093/bioadv/vbad186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/09/2023] [Accepted: 12/17/2023] [Indexed: 01/13/2024]
Abstract
Motivation Technical differences between gene expression sequencing experiments can cause variations in the data in the form of batch effect biases. These do not represent true biological variations between samples and can lead to false conclusions or hinder the ability to integrate multiple datasets. Since there is a growing need for the joint analysis of single-cell sequencing datasets from different sources, there is also a need to correct the resulting batch effects while maintaining the true biological variations in the data. Results We developed a semi-supervised deep learning architecture called Autoencoder-based Batch Correction (ABC) for integrating single-cell sequencing datasets. Our method removes batch effects through a guided process of data compression using supervised cell type classifier branches for biological signal retention. It aligns the different batches using an adversarial training approach. We comprehensively evaluate the performance of our method using four single-cell sequencing datasets and multiple measures for batch effect removal and biological variation conservation. ABC outperforms 10 state-of-the-art methods for this task including Seurat, scGen, ComBat, scanorama, scVI, scANVI, AutoClass, Harmony, scDREAMER, and CLEAR, correcting various types of batch effects while preserving intricate biological variations.
Collapse
Affiliation(s)
- Reut Danino
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Iftach Nachman
- School of Neurobiology, Biochemistry and Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
17
|
Bettencourt C, Skene N, Bandres-Ciga S, Anderson E, Winchester LM, Foote IF, Schwartzentruber J, Botia JA, Nalls M, Singleton A, Schilder BM, Humphrey J, Marzi SJ, Toomey CE, Kleifat AA, Harshfield EL, Garfield V, Sandor C, Keat S, Tamburin S, Frigerio CS, Lourida I, Ranson JM, Llewellyn DJ. Artificial intelligence for dementia genetics and omics. Alzheimers Dement 2023; 19:5905-5921. [PMID: 37606627 PMCID: PMC10841325 DOI: 10.1002/alz.13427] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/14/2023] [Accepted: 07/18/2023] [Indexed: 08/23/2023]
Abstract
Genetics and omics studies of Alzheimer's disease and other dementia subtypes enhance our understanding of underlying mechanisms and pathways that can be targeted. We identified key remaining challenges: First, can we enhance genetic studies to address missing heritability? Can we identify reproducible omics signatures that differentiate between dementia subtypes? Can high-dimensional omics data identify improved biomarkers? How can genetics inform our understanding of causal status of dementia risk factors? And which biological processes are altered by dementia-related genetic variation? Artificial intelligence (AI) and machine learning approaches give us powerful new tools in helping us to tackle these challenges, and we review possible solutions and examples of best practice. However, their limitations also need to be considered, as well as the need for coordinated multidisciplinary research and diverse deeply phenotyped cohorts. Ultimately AI approaches improve our ability to interrogate genetics and omics data for precision dementia medicine. HIGHLIGHTS: We have identified five key challenges in dementia genetics and omics studies. AI can enable detection of undiscovered patterns in dementia genetics and omics data. Enhanced and more diverse genetics and omics datasets are still needed. Multidisciplinary collaborative efforts using AI can boost dementia research.
Collapse
Affiliation(s)
- Conceicao Bettencourt
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
| | - Nathan Skene
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Sara Bandres-Ciga
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Emma Anderson
- Department of Mental Health of Older People, Division of Psychiatry, University College London, London, UK
| | | | - Isabelle F Foote
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, USA
| | - Jeremy Schwartzentruber
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
- Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, California, USA
| | - Juan A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Mike Nalls
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International LLC, Washington, DC, USA
| | - Andrew Singleton
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Jack Humphrey
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Christina E Toomey
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
- Department of Clinical and Movement Neuroscience, UCL Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, London, UK
| | - Ahmad Al Kleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Eric L Harshfield
- Stroke Research Group, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Victoria Garfield
- MRC Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, University College London, London, UK
| | - Cynthia Sandor
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Samuel Keat
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Section, University of Verona, Verona, Italy
| | - Carlo Sala Frigerio
- UK Dementia Research Institute, Queen Square Institute of Neurology, University College London, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
18
|
姜 超, 胡 龙, 徐 春, 葛 芹, 赵 祥. [Imputation method for dropout in single-cell transcriptome data]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:778-783. [PMID: 37666769 PMCID: PMC10477391 DOI: 10.7507/1001-5515.202301009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 07/27/2023] [Indexed: 09/06/2023]
Abstract
Single-cell transcriptome sequencing (scRNA-seq) can resolve the expression characteristics of cells in tissues with single-cell precision, enabling researchers to quantify cellular heterogeneity within populations with higher resolution, revealing potentially heterogeneous cell populations and the dynamics of complex tissues. However, the presence of a large number of technical zeros in scRNA-seq data will have an impact on downstream analysis of cell clustering, differential genes, cell annotation, and pseudotime, hindering the discovery of meaningful biological signals. The main idea to solve this problem is to make use of the potential correlation between cells and genes, and to impute the technical zeros through the observed data. Based on this, this paper reviewed the basic methods of imputing technical zeros in the scRNA-seq data and discussed the advantages and disadvantages of the existing methods. Finally, recommendations and perspectives on the use and development of the method were provided.
Collapse
Affiliation(s)
- 超 姜
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
- 新格元生物科技有限公司(南京 210018)Singleron BiotechCo., Ltd, Nanjing 210018, P. R. China
| | - 龙飞 胡
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 春祥 徐
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 芹玉 葛
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| | - 祥伟 赵
- 东南大学 生物科学与医学工程学院 生物电子学国家重点实验室(南京 210096)State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, P. R. China
| |
Collapse
|
19
|
Samad T, Wu SM. The sum of the parts is greater than the whole: current research models for congenital heart disease. NATURE CARDIOVASCULAR RESEARCH 2023; 2:708-710. [PMID: 39195960 DOI: 10.1038/s44161-023-00308-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Affiliation(s)
- Tahmina Samad
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
- Division of Cardiology, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Sean M Wu
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Pediatrics, Stanford University, Stanford, CA, USA.
- Division of Cardiovascular Medicine, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
20
|
Massimino M, Martorana F, Stella S, Vitale SR, Tomarchio C, Manzella L, Vigneri P. Single-Cell Analysis in the Omics Era: Technologies and Applications in Cancer. Genes (Basel) 2023; 14:1330. [PMID: 37510235 PMCID: PMC10380065 DOI: 10.3390/genes14071330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023] Open
Abstract
Cancer molecular profiling obtained with conventional bulk sequencing describes average alterations obtained from the entire cellular population analyzed. In the era of precision medicine, this approach is unable to track tumor heterogeneity and cannot be exploited to unravel the biological processes behind clonal evolution. In the last few years, functional single-cell omics has improved our understanding of cancer heterogeneity. This approach requires isolation and identification of single cells starting from an entire population. A cell suspension obtained by tumor tissue dissociation or hematological material can be manipulated using different techniques to separate individual cells, employed for single-cell downstream analysis. Single-cell data can then be used to analyze cell-cell diversity, thus mapping evolving cancer biological processes. Despite its unquestionable advantages, single-cell analysis produces massive amounts of data with several potential biases, stemming from cell manipulation and pre-amplification steps. To overcome these limitations, several bioinformatic approaches have been developed and explored. In this work, we provide an overview of this entire process while discussing the most recent advances in the field of functional omics at single-cell resolution.
Collapse
Affiliation(s)
- Michele Massimino
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Federica Martorana
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Stefania Stella
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Silvia Rita Vitale
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Cristina Tomarchio
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Livia Manzella
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Paolo Vigneri
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
- Humanitas Istituto Clinico Catanese, University Oncology Department, 95045 Catania, Italy
| |
Collapse
|
21
|
Yin F, Zhao H, Lu S, Shen J, Li M, Mao X, Li F, Shi J, Li J, Dong B, Xue W, Zuo X, Yang X, Fan C. DNA-framework-based multidimensional molecular classifiers for cancer diagnosis. NATURE NANOTECHNOLOGY 2023; 18:677-686. [PMID: 36973399 DOI: 10.1038/s41565-023-01348-9] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
A molecular classification of diseases that accurately reflects clinical behaviour lays the foundation of precision medicine. The development of in silico classifiers coupled with molecular implementation based on DNA reactions marks a key advance in more powerful molecular classification, but it nevertheless remains a challenge to process multiple molecular datatypes. Here we introduce a DNA-encoded molecular classifier that can physically implement the computational classification of multidimensional molecular clinical data. To produce unified electrochemical sensing signals across heterogeneous molecular binding events, we exploit DNA-framework-based programmable atom-like nanoparticles with n valence to develop valence-encoded signal reporters that enable linearity in translating virtually any biomolecular binding events to signal gains. Multidimensional molecular information in computational classification is thus precisely assigned weights for bioanalysis. We demonstrate the implementation of a molecular classifier based on programmable atom-like nanoparticles to perform biomarker panel screening and analyse a panel of six biomarkers across three-dimensional datatypes for a near-deterministic molecular taxonomy of prostate cancer patients.
Collapse
Affiliation(s)
- Fangfei Yin
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Haipei Zhao
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Shasha Lu
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
- School of Materials Science and Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Juwen Shen
- Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Min Li
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xiuhai Mao
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Fan Li
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Jiye Shi
- Division of Physical Biology, CAS Key Laboratory of Interfacial Physics and Technology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | - Jiang Li
- Division of Physical Biology, CAS Key Laboratory of Interfacial Physics and Technology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
- The Interdisciplinary Research Center, Shanghai Synchrotron Radiation Facility, Zhangjiang Laboratory, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, China
| | - Baijun Dong
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Wei Xue
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaolei Zuo
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China.
| | - Xiurong Yang
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, China
| | - Chunhai Fan
- Institute of Molecular Medicine, Department of Urology, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering, Zhangjiang Institute for Advanced Study, and National Center for Translational Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
22
|
Xiong Z, Luo J, Shi W, Liu Y, Xu Z, Wang B. scGCL: an imputation method for scRNA-seq data based on graph contrastive learning. Bioinformatics 2023; 39:7056638. [PMID: 36825817 PMCID: PMC9991516 DOI: 10.1093/bioinformatics/btad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 01/14/2023] [Accepted: 02/24/2023] [Indexed: 02/25/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) is widely used to reveal cellular heterogeneity, complex disease mechanisms and cell differentiation processes. Due to high sparsity and complex gene expression patterns, scRNA-seq data present a large number of dropout events, affecting downstream tasks such as cell clustering and pseudo-time analysis. Restoring the expression levels of genes is essential for reducing technical noise and facilitating downstream analysis. However, existing scRNA-seq data imputation methods ignore the topological structure information of scRNA-seq data and cannot comprehensively utilize the relationships between cells. RESULTS Here, we propose a single-cell Graph Contrastive Learning method for scRNA-seq data imputation, named scGCL, which integrates graph contrastive learning and Zero-inflated Negative Binomial (ZINB) distribution to estimate dropout values. scGCL summarizes global and local semantic information through contrastive learning and selects positive samples to enhance the representation of target nodes. To capture the global probability distribution, scGCL introduces an autoencoder based on the ZINB distribution, which reconstructs the scRNA-seq data based on the prior distribution. Through extensive experiments, we verify that scGCL outperforms existing state-of-the-art imputation methods in clustering performance and gene imputation on 14 scRNA-seq datasets. Further, we find that scGCL can enhance the expression patterns of specific genes in Alzheimer's disease datasets. AVAILABILITY AND IMPLEMENTATION The code and data of scGCL are available on Github: https://github.com/zehaoxiong123/scGCL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zehao Xiong
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Wanwan Shi
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Ying Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Zhongyuan Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Bo Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| |
Collapse
|
23
|
Qi R, Zou Q. Trends and Potential of Machine Learning and Deep Learning in Drug Study at Single-Cell Level. RESEARCH (WASHINGTON, D.C.) 2023; 6:0050. [PMID: 36930772 PMCID: PMC10013796 DOI: 10.34133/research.0050] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023]
Abstract
Cancer treatments always face challenging problems, particularly drug resistance due to tumor cell heterogeneity. The existing datasets include the relationship between gene expression and drug sensitivities; however, the majority are based on tissue-level studies. Study drugs at the single-cell level are perspective to overcome minimal residual disease caused by subclonal resistant cancer cells retained after initial curative therapy. Fortunately, machine learning techniques can help us understand how different types of cells respond to different cancer drugs from the perspective of single-cell gene expression. Good modeling using single-cell data and drug response information will not only improve machine learning for cell-drug outcome prediction but also facilitate the discovery of drugs for specific cancer subgroups and specific cancer treatments. In this paper, we review machine learning and deep learning approaches in drug research. By analyzing the application of these methods on cancer cell lines and single-cell data and comparing the technical gap between single-cell sequencing data analysis and single-cell drug sensitivity analysis, we hope to explore the trends and potential of drug research at the single-cell data level and provide more inspiration for drug research at the single-cell level. We anticipate that this review will stimulate the innovative use of machine learning methods to address new challenges in precision medicine more broadly.
Collapse
Affiliation(s)
- Ren Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
24
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
25
|
New Developments and Possibilities in Reanalysis and Reinterpretation of Whole Exome Sequencing Datasets for Unsolved Rare Diseases Using Machine Learning Approaches. Int J Mol Sci 2022; 23:ijms23126792. [PMID: 35743235 PMCID: PMC9224427 DOI: 10.3390/ijms23126792] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 11/21/2022] Open
Abstract
Rare diseases impact the lives of 300 million people in the world. Rapid advances in bioinformatics and genomic technologies have enabled the discovery of causes of 20–30% of rare diseases. However, most rare diseases have remained as unsolved enigmas to date. Newer tools and availability of high throughput sequencing data have enabled the reanalysis of previously undiagnosed patients. In this review, we have systematically compiled the latest developments in the discovery of the genetic causes of rare diseases using machine learning methods. Importantly, we have detailed methods available to reanalyze existing whole exome sequencing data of unsolved rare diseases. We have identified different reanalysis methodologies to solve problems associated with sequence alterations/mutations, variation re-annotation, protein stability, splice isoform malfunctions and oligogenic analysis. In addition, we give an overview of new developments in the field of rare disease research using whole genome sequencing data and other omics.
Collapse
|