1
|
Arya A, Tripathi P, Dubey N, Aier I, Kumar Varadwaj P. Navigating single-cell RNA-sequencing: protocols, tools, databases, and applications. Genomics Inform 2025; 23:13. [PMID: 40382658 DOI: 10.1186/s44342-025-00044-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Accepted: 04/07/2025] [Indexed: 05/20/2025] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) technology brought about a revolutionary change in the transcriptomic world, paving the way for comprehensive analysis of cellular heterogeneity in complex biological systems. It enabled researchers to see how different cells behaved at single-cell levels, providing new insights into the process. However, despite all these advancements, scRNA-seq also experiences challenges related to the complexity of data analysis, interpretation, and multi-omics data integration. In this review, these complications were discussed in detail, directly pointing at the optimization of scRNA-seq approaches and understanding the world of single-cell and its dynamics. Different protocols and currently functional single-cell databases were also covered. This review highlights different tools for the analysis of scRNA-seq and their methodologies, emphasizing innovative techniques that enhance resolution and accuracy at a single-cell level. Various applications were explored across domains including drug discovery, tumor microenvironment (TME), biomarker discovery, and microbial profiling, and case studies were discussed to explain the importance of scRNA-seq by uncovering novel and rare cell types and their identification. This review underlines a crucial aspect of scRNA-seq in the advancement of personalized medicine and highlights its potential to understand the complexity of biological systems.
Collapse
Affiliation(s)
- Ankish Arya
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Jhalwa, Prayagraj, 211015, Uttar Pradesh, India
| | - Prabhat Tripathi
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Jhalwa, Prayagraj, 211015, Uttar Pradesh, India
| | - Nidhi Dubey
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Jhalwa, Prayagraj, 211015, Uttar Pradesh, India
| | - Imlimaong Aier
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Jhalwa, Prayagraj, 211015, Uttar Pradesh, India
| | - Pritish Kumar Varadwaj
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Jhalwa, Prayagraj, 211015, Uttar Pradesh, India.
| |
Collapse
|
2
|
Liu WS, Si T, Kriauciunas A, Snell M, Gong H. Bidirectional f-Divergence-Based Deep Generative Method for Imputing Missing Values in Time-Series Data. STATS 2025; 8:7. [PMID: 39911165 PMCID: PMC11793919 DOI: 10.3390/stats8010007] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2025] Open
Abstract
Imputing missing values in high-dimensional time-series data remains a significant challenge in statistics and machine learning. Although various methods have been proposed in recent years, many struggle with limitations and reduced accuracy, particularly when the missing rate is high. In this work, we present a novel f-divergence-based bidirectional generative adversarial imputation network, tf-BiGAIN, designed to address these challenges in time-series data imputation. Unlike traditional imputation methods, tf-BiGAIN employs a generative model to synthesize missing values without relying on distributional assumptions. The imputation process is achieved by training two neural networks, implemented using bidirectional modified gated recurrent units, with f-divergence serving as the objective function to guide optimization. Compared to existing deep learning-based methods, tf-BiGAIN introduces two key innovations. First, the use of f-divergence provides a flexible and adaptable framework for optimizing the model across diverse imputation tasks, enhancing its versatility. Second, the use of bidirectional gated recurrent units allows the model to leverage both forward and backward temporal information. This bidirectional approach enables the model to effectively capture dependencies from both past and future observations, enhancing its imputation accuracy and robustness. We applied tf-BiGAIN to analyze two real-world time-series datasets, demonstrating its superior performance in imputing missing values and outperforming existing methods in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Wen-Shan Liu
- Department of Health and Clinical Outcomes Research, Saint Louis University, St. Louis, MO 63103, USA
| | - Tong Si
- Department of Mathematics and Computer Science, Culver-Stockton College, Canton, MO 63435, USA
| | - Aldas Kriauciunas
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Marcus Snell
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Haijun Gong
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| |
Collapse
|
3
|
Schumann Y, Gocke A, Neumann JE. Computational Methods for Data Integration and Imputation of Missing Values in Omics Datasets. Proteomics 2025; 25:e202400100. [PMID: 39740174 DOI: 10.1002/pmic.202400100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/08/2024] [Accepted: 11/26/2024] [Indexed: 01/02/2025]
Abstract
Molecular profiling of different omic-modalities (e.g., DNA methylomics, transcriptomics, proteomics) in biological systems represents the basis for research and clinical decision-making. Measurement-specific biases, so-called batch effects, often hinder the integration of independently acquired datasets, and missing values further hamper the applicability of typical data processing algorithms. In addition to careful experimental design, well-defined standards in data acquisition and data exchange, the alleviation of these phenomena particularly requires a dedicated data integration and preprocessing pipeline. This review aims to give a comprehensive overview of computational methods for data integration and missing value imputation for omic data analyses. We provide formal definitions for missing value mechanisms and propose a novel statistical taxonomy for batch effects, especially in the presence of missing data. Based on an automated document search and systematic literature review, we describe 32 distinct data integration methods from five main methodological categories, as well as 37 algorithms for missing value imputation from five separate categories. Additionally, this review highlights multiple quantitative evaluation methods to aid researchers in selecting a suitable set of methods for their work. Finally, this work provides an integrated discussion of the relevance of batch effects and missing values in omics with corresponding method recommendations. We then propose a comprehensive three-step workflow from the study conception to final data analysis and deduce perspectives for future research. Eventually, we present a comprehensive flow chart as well as exemplary decision trees to aid practitioners in the selection of specific approaches for imputation and data integration in their studies.
Collapse
Affiliation(s)
- Yannis Schumann
- IT-Department, Deutsches Elektronen-Synchroton DESY, Hamburg, Germany
| | - Antonia Gocke
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Core Facility Mass Spectrometric Proteomics, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Julia E Neumann
- Center for Molecular Neurobiology (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
- Institute of Neuropathology, University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| |
Collapse
|
4
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2025; 68:5-102. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
5
|
Shi M, Li X. Addressing scalability and managing sparsity and dropout events in single-cell representation identification with ZIGACL. Brief Bioinform 2024; 26:bbae703. [PMID: 39775477 PMCID: PMC11705091 DOI: 10.1093/bib/bbae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 11/06/2024] [Accepted: 12/23/2024] [Indexed: 01/11/2025] Open
Abstract
Despite significant advancements in single-cell representation learning, scalability and managing sparsity and dropout events continue to challenge the field as scRNA-seq datasets expand. While current computational tools struggle to maintain both efficiency and accuracy, the accurate connection of these dropout events to specific biological functions usually requires additional, complex experiments, often hampered by potential inaccuracies in cell-type annotation. To tackle these challenges, the Zero-Inflated Graph Attention Collaborative Learning (ZIGACL) method has been developed. This innovative approach combines a Zero-Inflated Negative Binomial model with a Graph Attention Network, leveraging mutual information from neighboring cells to enhance dimensionality reduction and apply dynamic adjustments to the learning process through a co-supervised deep graph clustering model. ZIGACL's integration of denoising and topological embedding significantly improves clustering accuracy and ensures similar cells are grouped closely in the latent space. Comparative analyses across nine real scRNA-seq datasets have shown that ZIGACL significantly enhances single-cell data analysis by offering superior clustering performance and improved stability in cell representations, effectively addressing scalability and managing sparsity and dropout events, thereby advancing our understanding of cellular heterogeneity.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui, China
| | - Xuefeng Li
- School of Electrical Engineering and Automation, Hefei University of Technology, Hefei, Anhui, China
| |
Collapse
|
6
|
Zhao L, Jiang L, Xie Y, Huang J, Xie H, Tian J, Zhang D. scDTL: enhancing single-cell RNA-seq imputation through deep transfer learning with bulk cell information. Brief Bioinform 2024; 25:bbae555. [PMID: 39504481 PMCID: PMC11540133 DOI: 10.1093/bib/bbae555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 08/30/2024] [Accepted: 10/16/2024] [Indexed: 11/08/2024] Open
Abstract
The increasing single-cell RNA sequencing (scRNA-seq) data enable researchers to explore cellular heterogeneity and gene expression profiles, offering a high-resolution view of the transcriptome at the single-cell level. However, the dropout events, which are often present in scRNA-seq data, remaining challenges for downstream analysis. Although a number of studies have been developed to recover single-cell expression profiles, their performance may be hindered due to not fully exploring the inherent relations between genes. To address the issue, we propose scDTL, a deep transfer learning based approach for scRNA-seq data imputation by harnessing the bulk RNA-sequencing information. We firstly employ a denoising autoencoder trained on bulk RNA-seq data as the initial imputation model, and then leverage a domain adaptation framework that transfers the knowledge learned by the bulk imputation model to scRNA-seq learning task. In addition, scDTL employs a parallel operation with a 1D U-Net denoising model to provide gene representations of varying granularity, capturing both coarse and fine features of the scRNA-seq data. Finally, we utilize a cross-channel attention mechanism to fuse the features learned from the transferred bulk imputation model and U-Net model. In the evaluation, we conduct extensive experiments to demonstrate that scDTL could outperform other state-of-the-art methods in the quantitative comparison and downstream analyses.
Collapse
Affiliation(s)
- Liuyang Zhao
- College of Computer Science and Software Engineering, Shenzhen University, Guangdong 518057, China
| | - Landu Jiang
- College of Future Technology, HKUST(GZ), Guangdong 510641, China
| | - Yufeng Xie
- Shenzhen Hospital of Guangzhou University of Chinese Medicine (Futian), Guangdong 518034, China
| | - JianHao Huang
- Shenzhen Hospital of Guangzhou University of Chinese Medicine (Futian), Guangdong 518034, China
| | - Haoran Xie
- Department of Computing and Decision Sciences, Lingnan University, Hong Kong Special Administrative Region 999077, China
| | - Jun Tian
- Department of Biochemistry, School of Medicine, Southern University of Science and Technology, Guangdong 518055, China
- Key University Laboratory of Metabolism and Health of Guangdong, Southern University of Science and Technology, Shenzhen 518055, China
| | - Dian Zhang
- College of Computer Science and Software Engineering, Shenzhen University, Guangdong 518057, China
| |
Collapse
|
7
|
Qian Y, Zou Q, Zhao M, Liu Y, Guo F, Ding Y. scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization. PLoS Comput Biol 2024; 20:e1012339. [PMID: 39116191 PMCID: PMC11338450 DOI: 10.1371/journal.pcbi.1012339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 08/21/2024] [Accepted: 07/19/2024] [Indexed: 08/10/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool in genomics research, enabling the analysis of gene expression at the individual cell level. However, scRNA-seq data often suffer from a high rate of dropouts, where certain genes fail to be detected in specific cells due to technical limitations. This missing data can introduce biases and hinder downstream analysis. To overcome this challenge, the development of effective imputation methods has become crucial in the field of scRNA-seq data analysis. Here, we propose an imputation method based on robust and non-negative matrix factorization (scRNMF). Instead of other matrix factorization algorithms, scRNMF integrates two loss functions: L2 loss and C-loss. The L2 loss function is highly sensitive to outliers, which can introduce substantial errors. We utilize the C-loss function when dealing with zero values in the raw data. The primary advantage of the C-loss function is that it imposes a smaller punishment for larger errors, which results in more robust factorization when handling outliers. Various datasets of different sizes and zero rates are used to evaluate the performance of scRNMF against other state-of-the-art methods. Our method demonstrates its power and stability as a tool for imputation of scRNA-seq data.
Collapse
Affiliation(s)
- Yuqing Qian
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mengyuan Zhao
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yi Liu
- Institute Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
8
|
Liu W, Pan Y, Teng Z, Xu J. scDMAE: A Generative Denoising Model Adopted Mask Strategy for scRNA-Seq Data Recovery. IEEE J Biomed Health Inform 2024; 28:3772-3780. [PMID: 38568766 DOI: 10.1109/jbhi.2024.3383921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technology has revolutionized gene expression studies at the single-cell level. However, the presence of technical noise and data sparsity in scRNA-seq often undermines the accuracy of subsequent analyses. Existing methods for denoising and imputing scRNA-seq data often rely on stringent assumptions about data distribution, limiting the effectiveness of data recovery. In this study, we propose the scDMAE model for denoising and recovery of scRNA-seq data. First, the model fuses gene expression features and topological features to discern the primary expression patterns of genes in cells. Then, an autoencoder with a masking strategy is used to model dropout events and separate potential noise in the data. Finally, the model incorporates the original raw data to recover the true biological expression value. By conducting experiments on various types of scRNA-Seq datasets, scDMAE demonstrates superior performance compared to other comparative methods based on six distinct evaluation metrics in downstream analysis. The scDMAE method can accurately cluster similar cell populations, identify differential genes and infer cell trajectories.
Collapse
|
9
|
Kang Y, Zhang H, Guan J. scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data. Brief Bioinform 2024; 25:bbae148. [PMID: 38600665 PMCID: PMC11006796 DOI: 10.1093/bib/bbae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/26/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) facilitates the study of cell type heterogeneity and the construction of cell atlas. However, due to its limitations, many genes may be detected to have zero expressions, i.e. dropout events, leading to bias in downstream analyses and hindering the identification and characterization of cell types and cell functions. Although many imputation methods have been developed, their performances are generally lower than expected across different kinds and dimensions of data and application scenarios. Therefore, developing an accurate and robust single-cell gene expression data imputation method is still essential. Considering to maintain the original cell-cell and gene-gene correlations and leverage bulk RNA sequencing (bulk RNA-seq) data information, we propose scINRB, a single-cell gene expression imputation method with network regularization and bulk RNA-seq data. scINRB adopts network-regularized non-negative matrix factorization to ensure that the imputed data maintains the cell-cell and gene-gene similarities and also approaches the gene average expression calculated from bulk RNA-seq data. To evaluate the performance, we test scINRB on simulated and experimental datasets and compare it with other commonly used imputation methods. The results show that scINRB recovers gene expression accurately even in the case of high dropout rates and dimensions, preserves cell-cell and gene-gene similarities and improves various downstream analyses including visualization, clustering and trajectory inference.
Collapse
Affiliation(s)
- Yue Kang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Hongyu Zhang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China
| |
Collapse
|
10
|
Wang J, Dong L, Zheng Z, Zhu Z, Xie B, Xie Y, Li X, Chen B, Li P. Effects of different KRAS mutants and Ki67 expression on diagnosis and prognosis in lung adenocarcinoma. Sci Rep 2024; 14:4085. [PMID: 38374309 PMCID: PMC10876986 DOI: 10.1038/s41598-023-48307-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 11/24/2023] [Indexed: 02/21/2024] Open
Abstract
Lung adenocarcinoma (LUAD) is a prevalent form of non-small cell lung cancer with a rising incidence in recent years. Understanding the mutation characteristics of LUAD is crucial for effective treatment and prediction of this disease. Among the various mutations observed in LUAD, KRAS mutations are particularly common. Different subtypes of KRAS mutations can activate the Ras signaling pathway to varying degrees, potentially influencing the pathogenesis and prognosis of LUAD. This study aims to investigate the relationship between different KRAS mutation subtypes and the pathogenesis and prognosis of LUAD. A total of 63 clinical samples of LUAD were collected for this study. The samples were analyzed using targeted gene sequencing panels to obtain sequencing data. To complement the dataset, additional clinical and sequencing data were obtained from TCGA and MSK. The analysis revealed significantly higher Ki67 immunohistochemical scores in patients with missense mutations compared to controls. Moreover, the expression level of KRAS was found to be significantly correlated with Ki67 expression. Enrichment analysis indicated that KRAS missense mutations activated the SWEET_LUNG_CANCER_KRAS_DN and CREIGHTON_ENDOCRINE_THERAPY_RESISTANCE_2 pathways. Additionally, patients with KRAS missense mutations and high Ki67 IHC scores exhibited significantly higher tumor mutational burden levels compared to other groups, which suggests they are more likely to be responsive to ICIs. Based on the data from MSK and TCGA, it was observed that patients with KRAS missense mutations had shorter survival compared to controls, and Ki67 expression level could more accurately predict patient prognosis. In conclusion, when utilizing KRAS mutations as biomarkers for the treatment and prediction of LUAD, it is important to consider the specific KRAS mutant subtypes and Ki67 expression levels. These findings contribute to a better understanding of LUAD and have implications for personalized therapeutic approaches in the management of this disease.
Collapse
Affiliation(s)
- Jun Wang
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Liwen Dong
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Zhaowei Zheng
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Zhen Zhu
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Baisheng Xie
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Yue Xie
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Xiongwei Li
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China
| | - Bing Chen
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China.
| | - Pan Li
- Department of Thoracic Surgery, Hangzhou TCM Hospital Affiliated to Zhejiang Chinese Medical University, Hangzhou, 310007, China.
| |
Collapse
|
11
|
Dong S, Liu Y, Gong Y, Dong X, Zeng X. scCAN: Clustering With Adaptive Neighbor-Based Imputation Method for Single-Cell RNA-Seq Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:95-105. [PMID: 38285569 DOI: 10.1109/tcbb.2023.3337231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to study cellular heterogeneity in different samples. However, due to technical deficiencies, dropout events often result in zero gene expression values in the gene expression matrix. In this paper, we propose a new imputation method called scCAN, based on adaptive neighborhood clustering, to estimate the zero value of dropouts. Our method continuously updates cell-cell similarity information by simultaneously learning similarity relationships, clustering structures, and imposing new rank constraints on the Laplacian matrix of the similarity matrix, improving the imputation of dropout zero values. To evaluate the performance of this method, we used four simulated and eight real scRNA-seq data for downstream analyses, including cell clustering, recovered gene expression, and reconstructed cell trajectories. Our method improves the performance of the downstream analysis and is better than other imputation methods.
Collapse
|
12
|
Zheng W, Min W, Wang S. TsImpute: an accurate two-step imputation method for single-cell RNA-seq data. Bioinformatics 2023; 39:btad731. [PMID: 38039139 PMCID: PMC10724850 DOI: 10.1093/bioinformatics/btad731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 11/22/2023] [Accepted: 11/30/2023] [Indexed: 12/03/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology has enabled discovering gene expression patterns at single cell resolution. However, due to technical limitations, there are usually excessive zeros, called "dropouts," in scRNA-seq data, which may mislead the downstream analysis. Therefore, it is crucial to impute these dropouts to recover the biological information. RESULTS We propose a two-step imputation method called tsImpute to impute scRNA-seq data. At the first step, tsImpute adopts zero-inflated negative binomial distribution to discriminate dropouts from true zeros and performs initial imputation by calculating the expected expression level. At the second step, it conducts clustering with this modified expression matrix, based on which the final distance weighted imputation is performed. Numerical results based on both simulated and real data show that tsImpute achieves favorable performance in terms of gene expression recovery, cell clustering, and differential expression analysis. AVAILABILITY AND IMPLEMENTATION The R package of tsImpute is available at https://github.com/ZhengWeihuaYNU/tsImpute.
Collapse
Affiliation(s)
- Weihua Zheng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
| | - Wenwen Min
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, China
- Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650504, China
| |
Collapse
|
13
|
Meng Y, Wang Y, Xu J, Lu C, Tang X, Peng T, Zhang B, Tian G, Yang J. Drug repositioning based on weighted local information augmented graph neural network. Brief Bioinform 2023; 25:bbad431. [PMID: 38019732 PMCID: PMC10686358 DOI: 10.1093/bib/bbad431] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 12/01/2023] Open
Abstract
Drug repositioning, the strategy of redirecting existing drugs to new therapeutic purposes, is pivotal in accelerating drug discovery. While many studies have engaged in modeling complex drug-disease associations, they often overlook the relevance between different node embeddings. Consequently, we propose a novel weighted local information augmented graph neural network model, termed DRAGNN, for drug repositioning. Specifically, DRAGNN firstly incorporates a graph attention mechanism to dynamically allocate attention coefficients to drug and disease heterogeneous nodes, enhancing the effectiveness of target node information collection. To prevent excessive embedding of information in a limited vector space, we omit self-node information aggregation, thereby emphasizing valuable heterogeneous and homogeneous information. Additionally, average pooling in neighbor information aggregation is introduced to enhance local information while maintaining simplicity. A multi-layer perceptron is then employed to generate the final association predictions. The model's effectiveness for drug repositioning is supported by a 10-times 10-fold cross-validation on three benchmark datasets. Further validation is provided through analysis of the predicted associations using multiple authoritative data sources, molecular docking experiments and drug-disease network analysis, laying a solid foundation for future drug discovery.
Collapse
Affiliation(s)
- Yajie Meng
- Center of Applied Mathematics & Interdisciplinary Science, School of Mathematical & Physical Sciences, Wuhan Textile University, No. 1, Yangguang Avenue, Jiangxia District, Wuhan City, Hubei Province 430200, China
| | - Yi Wang
- Center of Applied Mathematics & Interdisciplinary Science, School of Mathematical & Physical Sciences, Wuhan Textile University, No. 1, Yangguang Avenue, Jiangxia District, Wuhan City, Hubei Province 430200, China
| | - Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Road (S), Yuelu District, Changsha, Hunan Province 410082, China
| | - Changcheng Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Road (S), Yuelu District, Changsha, Hunan Province 410082, China
| | - Xianfang Tang
- Center of Applied Mathematics & Interdisciplinary Science, School of Mathematical & Physical Sciences, Wuhan Textile University, No. 1, Yangguang Avenue, Jiangxia District, Wuhan City, Hubei Province 430200, China
| | - Tao Peng
- Center of Applied Mathematics & Interdisciplinary Science, School of Mathematical & Physical Sciences, Wuhan Textile University, No. 1, Yangguang Avenue, Jiangxia District, Wuhan City, Hubei Province 430200, China
| | - Bengong Zhang
- Center of Applied Mathematics & Interdisciplinary Science, School of Mathematical & Physical Sciences, Wuhan Textile University, No. 1, Yangguang Avenue, Jiangxia District, Wuhan City, Hubei Province 430200, China
| | - Geng Tian
- Geneis Beijing Co., Ltd, No. 31, New North Road, Laiguanying, Chaoyang District, Beijing 100102, China
| | - Jialiang Yang
- Geneis Beijing Co., Ltd, No. 31, New North Road, Laiguanying, Chaoyang District, Beijing 100102, China
| |
Collapse
|
14
|
Li Y, Wu M, Ma S, Wu M. ZINBMM: a general mixture model for simultaneous clustering and gene selection using single-cell transcriptomic data. Genome Biol 2023; 24:208. [PMID: 37697330 PMCID: PMC10496184 DOI: 10.1186/s13059-023-03046-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 08/22/2023] [Indexed: 09/13/2023] Open
Abstract
Clustering is a critical component of single-cell RNA sequencing (scRNA-seq) data analysis and can help reveal cell types and infer cell lineages. Despite considerable successes, there are few methods tailored to investigating cluster-specific genes contributing to cell heterogeneity, which can promote biological understanding of cell heterogeneity. In this study, we propose a zero-inflated negative binomial mixture model (ZINBMM) that simultaneously achieves effective scRNA-seq data clustering and gene selection. ZINBMM conducts a systemic analysis on raw counts, accommodating both batch effects and dropout events. Simulations and the analysis of five scRNA-seq datasets demonstrate the practical applicability of ZINBMM.
Collapse
Affiliation(s)
- Yang Li
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
- RSS and China-Re Life Joint Lab on Public Health and Risk Management, Renmin University of China, Beijing, China
- Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Mingcong Wu
- Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
- Statistical Consulting Center, Renmin University of China, Beijing, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
| |
Collapse
|
15
|
Hegarty C, Neto N, Cahill P, Floudas A. Computational approaches in rheumatic diseases - Deciphering complex spatio-temporal cell interactions. Comput Struct Biotechnol J 2023; 21:4009-4020. [PMID: 37649712 PMCID: PMC10462794 DOI: 10.1016/j.csbj.2023.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/04/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023] Open
Abstract
Inflammatory arthritis, including rheumatoid (RA), and psoriatic (PsA) arthritis, are clinically and immunologically heterogeneous diseases with no identified cure. Chronic inflammation of the synovial tissue ushers loss of function of the joint that severely impacts the patient's quality of life, eventually leading to disability and life-threatening comorbidities. The pathogenesis of synovial inflammation is the consequence of compounded immune and stromal cell interactions influenced by genetic and environmental factors. Deciphering the complexity of the synovial cellular landscape has accelerated primarily due to the utilisation of bulk and single cell RNA sequencing. Particularly the capacity to generate cell-cell interaction networks could reveal evidence of previously unappreciated processes leading to disease. However, there is currently a lack of universal nomenclature as a result of varied experimental and technological approaches that discombobulates the study of synovial inflammation. While spatial transcriptomic analysis that combines anatomical information with transcriptomic data of synovial tissue biopsies promises to provide more insights into disease pathogenesis, in vitro functional assays with single-cell resolution will be required to validate current bioinformatic applications. In order to provide a comprehensive approach and translate experimental data to clinical practice, a combination of clinical and molecular data with machine learning has the potential to enhance patient stratification and identify individuals at risk of arthritis that would benefit from early therapeutic intervention. This review aims to provide a comprehensive understanding of the effect of computational approaches in deciphering synovial inflammation pathogenesis and discuss the impact that further experimental and novel computational tools may have on therapeutic target identification and drug development.
Collapse
Affiliation(s)
- Ciara Hegarty
- Translational Immunology lab, School of Biotechnology, Dublin City University, Dublin, Ireland
| | - Nuno Neto
- Trinity Centre for Biomedical Engineering, Trinity College Dublin, Ireland
| | - Paul Cahill
- Vascular Biology lab, School of Biotechnology, Dublin City University, Dublin, Ireland
| | - Achilleas Floudas
- Translational Immunology lab, School of Biotechnology, Dublin City University, Dublin, Ireland
| |
Collapse
|
16
|
Pan P, Li J, Wang B, Tan X, Yin H, Han Y, Wang H, Shi X, Li X, Xie C, Chen L, Chen L, Bai Y, Li Z, Tian G. Molecular characterization of colorectal adenoma and colorectal cancer via integrated genomic transcriptomic analysis. Front Oncol 2023; 13:1067849. [PMID: 37546388 PMCID: PMC10401844 DOI: 10.3389/fonc.2023.1067849] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 06/21/2023] [Indexed: 08/08/2023] Open
Abstract
Introduction Colorectal adenoma can develop into colorectal cancer. Determining the risk of tumorigenesis in colorectal adenoma would be critical for avoiding the development of colorectal cancer; however, genomic features that could help predict the risk of tumorigenesis remain uncertain. Methods In this work, DNA and RNA parallel capture sequencing data covering 519 genes from colorectal adenoma and colorectal cancer samples were collected. The somatic mutation profiles were obtained from DNA sequencing data, and the expression profiles were obtained from RNA sequencing data. Results Despite some similarities between the adenoma samples and the cancer samples, different mutation frequencies, co-occurrences, and mutually exclusive patterns were detected in the mutation profiles of patients with colorectal adenoma and colorectal cancer. Differentially expressed genes were also detected between the two patient groups using RNA sequencing. Finally, two random forest classification models were built, one based on mutation profiles and one based on expression profiles. The models distinguished adenoma and cancer samples with accuracy levels of 81.48% and 100.00%, respectively, showing the potential of the 519-gene panel for monitoring adenoma patients in clinical practice. Conclusion This study revealed molecular characteristics and correlations between colorectal adenoma and colorectal cancer, and it demonstrated that the 519-gene panel may be used for early monitoring of the progression of colorectal adenoma to cancer.
Collapse
Affiliation(s)
- Peng Pan
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Jingnan Li
- Department of Gastroenterology, Peking Union Medical College Hospital, Beijing, China
| | - Bo Wang
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoyan Tan
- Department of Gastroenterology, Maoming People's Hospital, Maoming, China
| | - Hekun Yin
- Department of Gastroenterology, Jiangmen Central Hospital, Jiangmen, China
| | - Yingmin Han
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| | - Haobin Wang
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| | - Xiaoli Shi
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoshuang Li
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Cuinan Xie
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Longfei Chen
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Lanyou Chen
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Yu Bai
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Zhaoshen Li
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Geng Tian
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| |
Collapse
|
17
|
Niu Z, Gao X, Xia Z, Zhao S, Sun H, Wang H, Liu M, Kong X, Ma C, Zhu H, Gao H, Liu Q, Yang F, Song X, Lu J, Zhou X. Prediction of small molecule drug-miRNA associations based on GNNs and CNNs. Front Genet 2023; 14:1201934. [PMID: 37323664 PMCID: PMC10268031 DOI: 10.3389/fgene.2023.1201934] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 05/17/2023] [Indexed: 06/17/2023] Open
Abstract
MicroRNAs (miRNAs) play a crucial role in various biological processes and human diseases, and are considered as therapeutic targets for small molecules (SMs). Due to the time-consuming and expensive biological experiments required to validate SM-miRNA associations, there is an urgent need to develop new computational models to predict novel SM-miRNA associations. The rapid development of end-to-end deep learning models and the introduction of ensemble learning ideas provide us with new solutions. Based on the idea of ensemble learning, we integrate graph neural networks (GNNs) and convolutional neural networks (CNNs) to propose a miRNA and small molecule association prediction model (GCNNMMA). Firstly, we use GNNs to effectively learn the molecular structure graph data of small molecule drugs, while using CNNs to learn the sequence data of miRNAs. Secondly, since the black-box effect of deep learning models makes them difficult to analyze and interpret, we introduce attention mechanisms to address this issue. Finally, the neural attention mechanism allows the CNNs model to learn the sequence data of miRNAs to determine the weight of sub-sequences in miRNAs, and then predict the association between miRNAs and small molecule drugs. To evaluate the effectiveness of GCNNMMA, we implement two different cross-validation (CV) methods based on two different datasets. Experimental results show that the cross-validation results of GCNNMMA on both datasets are better than those of other comparison models. In a case study, Fluorouracil was found to be associated with five different miRNAs in the top 10 predicted associations, and published experimental literature confirmed that Fluorouracil is a metabolic inhibitor used to treat liver cancer, breast cancer, and other tumors. Therefore, GCNNMMA is an effective tool for mining the relationship between small molecule drugs and miRNAs relevant to diseases.
Collapse
|
18
|
Zhu M, Li C, Lv K, Guo H, Hou R, Tian G, Yang J. MLSpatial: A machine-learning method to reconstruct the spatial distribution of cells from scRNA-seq by extracting spatial features. Comput Biol Med 2023; 159:106873. [PMID: 37105115 DOI: 10.1016/j.compbiomed.2023.106873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 03/03/2023] [Accepted: 03/30/2023] [Indexed: 04/29/2023]
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technologies allow us to interrogate the state of an individual cell within its microenvironment. However, prior to sequencing, cells should be dissociated first, making it difficult to obtain their spatial information. Since the spatial distribution of cells is critical in a few circumstances such as cancer immunotherapy, we present MLSpatial, a novel computational method to learn the relationship between gene expression patterns and spatial locations of cells, and then predict cell-to-cell distance distribution based on scRNA-seq data alone. RESULTS We collected the drosophila embryo dataset, which contains both the fluorescence in situ hybridization (FISH) data and single cell RNA-seq (scRNA-seq) data of drosophila embryo. The FISH data provided the spatial position of 3039 cells and the expression of 84 genes for each cell. The scRNA-seq data contains the expressions of 8924 genes in 1297 high-quality cells with cell location unknown. For a comparison, we also collected the MERFISH data of 645 osteosarcoma cells with cell location and the expression status of 10,050 genes known. For each data, the cells were randomly divided into a training set and a test set, in the ratio of 7:3. The cell-to-cell distances our model extracted had a higher correspondence (i.e., correlation coefficient 0.99) with those of the real situation than those of existing methods in the FISH data of drosophila embryo. However, in the osteosarcoma data, our model captured the spatial relationship between cells, with a correlation of 0.514 to that of the real situation. We also applied the model trained using the FISH data of drosophila embryo into the single cell data of drosophila embryo, for which the real location of cells are unknown. The reconstructed pseudo drosophila embryo and the real embryo (as shown by the FISH data) had a high similarity in the spatial distribution of gene expression. CONCLUSION MLSpatial can accurately restore the relative position of cells from scRNA-seq data; however, the performance depends on the type of cells. The trained model might be useful in reconstructing the spatial distributions of single cells with only scRNA-seq data, provided that the scRNA-seq data and the FISH data are under similar background (i.e., the same tissue with similar disease background).
Collapse
Affiliation(s)
- Mengbo Zhu
- Department of Mathematics, Ocean University of China, Qingdao, 266100, China; Geneis Beijing Co., Ltd., Beijing, 100102, China
| | - Changjun Li
- Department of Mathematics, Ocean University of China, Qingdao, 266100, China.
| | - Kebo Lv
- Department of Mathematics, Ocean University of China, Qingdao, 266100, China
| | - Hongzhe Guo
- Geneis Beijing Co., Ltd., Beijing, 100102, China.
| | - Rui Hou
- Geneis Beijing Co., Ltd., Beijing, 100102, China; Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, 100102, China; Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, China
| | - Jialiang Yang
- Geneis Beijing Co., Ltd., Beijing, 100102, China; Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, China; Chifeng Municipal Hospital, Chifeng, Inner Mongolia, 024000, China; Academician Workstation, Changsha Medical University, Changsha, 410219, China.
| |
Collapse
|
19
|
Huang Z, Wang J, Lu X, Mohd Zain A, Yu G. scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network. Brief Bioinform 2023; 24:7024714. [PMID: 36733262 DOI: 10.1093/bib/bbad040] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 12/21/2022] [Accepted: 01/18/2023] [Indexed: 02/04/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data are typically with a large number of missing values, which often results in the loss of critical gene signaling information and seriously limit the downstream analysis. Deep learning-based imputation methods often can better handle scRNA-seq data than shallow ones, but most of them do not consider the inherent relations between genes, and the expression of a gene is often regulated by other genes. Therefore, it is essential to impute scRNA-seq data by considering the regional gene-to-gene relations. We propose a novel model (named scGGAN) to impute scRNA-seq data that learns the gene-to-gene relations by Graph Convolutional Networks (GCN) and global scRNA-seq data distribution by Generative Adversarial Networks (GAN). scGGAN first leverages single-cell and bulk genomics data to explore inherent relations between genes and builds a more compact gene relation network to jointly capture the homogeneous and heterogeneous information. Then, it constructs a GCN-based GAN model to integrate the scRNA-seq, gene sequencing data and gene relation network for generating scRNA-seq data, and trains the model through adversarial learning. Finally, it utilizes data generated by the trained GCN-based GAN model to impute scRNA-seq data. Experiments on simulated and real scRNA-seq datasets show that scGGAN can effectively identify dropout events, recover the biologically meaningful expressions, determine subcellular states and types, improve the differential expression analysis and temporal dynamics analysis. Ablation experiments confirm that both the gene relation network and gene sequence data help the imputation of scRNA-seq data.
Collapse
Affiliation(s)
- Zimo Huang
- MEng student at School of Software, Shandong University, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
| | - Xudong Lu
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
| | | | - Guoxian Yu
- School of Software, Shandong University, China
| |
Collapse
|
20
|
Feng X, Zhang H, Lin H, Long H. Single-cell RNA-seq data analysis based on directed graph neural network. Methods 2023; 211:48-60. [PMID: 36804214 DOI: 10.1016/j.ymeth.2023.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/09/2022] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data scale surges with high-throughput sequencing technology development. However, although single-cell data analysis is a powerful tool, various issues have been reported, such as sequencing sparsity and complex differential patterns in gene expression. Statistical or traditional machine learning methods are inefficient, and the accuracy needs to be improved. The methods based on deep learning can not directly process non-Euclidean spatial data, such as cell diagrams. In this study, we have developed graph autoencoders and graph attention network for scRNA-seq analysis based on a directed graph neural network named scDGAE. Directed graph neural networks cannot only retain the connection properties of the directed graph but also expand the receptive field of the convolution operation. Cosine similarity, median L1 distance, and root-mean-squared error are used to measure the gene imputation performance of different methods with scDGAE. Furthermore, adjusted mutual information, normalized mutual information, completeness score, and Silhouette coefficient score are used to measure the cell clustering performance of different methods with scDGAE. Experiment results show that the scDGAE model achieves promising performance in gene imputation and cell clustering prediction on four scRNA-seq data sets with gold-standard cell labels. Furthermore, it is a robust framework that can be applied to general scRNA-Seq analyses.
Collapse
Affiliation(s)
- Xiang Feng
- College of Information Science Technology, Hainan Normal University, Haikou, Hainan 571158, China
| | - Hongqi Zhang
- College of Information Science Technology, Hainan Normal University, Haikou, Hainan 571158, China
| | - Hao Lin
- School of Mathematics and Statistics, Hainan Normal University, Haikou, Hainan 571158, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Haixia Long
- College of Information Science Technology, Hainan Normal University, Haikou, Hainan 571158, China.
| |
Collapse
|
21
|
Li B, Jin K, Ou-Yang L, Yan H, Zhang XF. scTSSR2: Imputing Dropout Events for Single-Cell RNA Sequencing Using Fast Two-Side Self-Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1445-1456. [PMID: 35476574 DOI: 10.1109/tcbb.2022.3170587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The single-cell RNA sequencing (scRNA-seq) technique begins a new era by revealing gene expression patterns at single-cell resolution, enabling studies of heterogeneity and transcriptome dynamics of complex tissues at single-cell resolution. However, existing large proportion of dropout events may hinder downstream analyses. Thus imputation of dropout events is an important step in analyzing scRNA-seq data. We develop scTSSR2, a new imputation method that combines matrix decomposition with the previously developed two-side sparse self-representation, leading to fast two-side sparse self-representation to impute dropout events in scRNA-seq data. The comparisons of computational speed and memory usage among different imputation methods show that scTSSR2 has distinct advantages in terms of computational speed and memory usage. Comprehensive downstream experiments show that scTSSR2 outperforms the state-of-the-art imputation methods. A user-friendly R package scTSSR2 is developed to denoise the scRNA-seq data to improve the data quality.
Collapse
|
22
|
Juan H, Huang H. Quantitative analysis of high‐throughput biological data. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2023. [DOI: 10.1002/wcms.1658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Hsueh‐Fen Juan
- Department of Life Science, Institute of Biomedical Electronics and Bioinformatics, and Center for Systems Biology National Taiwan University Taipei Taiwan
- Taiwan AI Labs Taipei Taiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics National Yang Ming Chiao Tung University Taipei Taiwan
| |
Collapse
|
23
|
Liu H, Bing P, Zhang M, Tian G, Ma J, Li H, Bao M, He K, He J, He B, Yang J. MNNMDA: Predicting human microbe-disease association via a method to minimize matrix nuclear norm. Comput Struct Biotechnol J 2023; 21:1414-1423. [PMID: 36824227 PMCID: PMC9941872 DOI: 10.1016/j.csbj.2022.12.053] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 12/29/2022] [Accepted: 12/30/2022] [Indexed: 01/03/2023] Open
Abstract
Identifying the potential associations between microbes and diseases is the first step for revealing the pathological mechanisms of microbe-associated diseases. However, traditional culture-based microbial experiments are expensive and time-consuming. Thus, it is critical to prioritize disease-associated microbes by computational methods for further experimental validation. In this study, we proposed a novel method called MNNMDA, to predict microbe-disease associations (MDAs) by applying a Matrix Nuclear Norm method into known microbe and disease data. Specifically, we first calculated Gaussian interaction profile kernel similarity and functional similarity for diseases and microbes. Then we constructed a heterogeneous information network by combining the integrated disease similarity network, the integrated microbe similarity network and the known microbe-disease bipartite network. Finally, we formulated the microbe-disease association prediction problem as a low-rank matrix completion problem, which was solved by minimizing the nuclear norm of a matrix with a few regularization terms. We tested the performances of MNNMDA in three datasets including HMDAD, Disbiome, and Combined Data with small, medium and large sizes respectively. We also compared MNNMDA with 5 state-of-the-art methods including KATZHMDA, LRLSHMDA, NTSHMDA, GATMDA, and KGNMDA, respectively. MNNMDA achieved area under the ROC curves (AUROC) of 0.9536 and 0.9364 respectively on HDMAD and Disbiome, better than the AUCs of compared methods under the 5-fold cross-validation for all microbe-disease associations. It also obtained a relatively good performance with AUROC 0.8858 in the combined data. In addition, MNNMDA was also better than other methods in area under precision and recall curve (AUPR) under the 5-fold cross-validation for all associations, and in both AUROC and AUPR under the 5-fold cross-validation for diseases and the 5-fold cross-validation for microbes. Finally, the case studies on colon cancer and inflammatory bowel disease (IBD) also validated the effectiveness of MNNMDA. In conclusion, MNNMDA is an effective method in predicting microbe-disease associations. Availability The codes and data for this paper are freely available at Github https://github.com/Haiyan-Liu666/MNNMDA.
Collapse
Affiliation(s)
- Haiyan Liu
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,College of Information Engineering, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China
| | - Meijun Zhang
- Geneis Beijing Co., Ltd., Beijing 100102, PR China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing 100102, PR China
| | - Jun Ma
- College of Information Engineering, Changsha Medical University, Changsha 410219, PR China
| | - Haigang Li
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China
| | - Meihua Bao
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China
| | - Kunhui He
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China
| | - Jianjun He
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China,Corresponding authors at: Academician Workstation, Changsha Medical University, Changsha 410219, PR China.
| | - Binsheng He
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China,Corresponding authors at: Academician Workstation, Changsha Medical University, Changsha 410219, PR China.
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha 410219, PR China,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, PR China,Geneis Beijing Co., Ltd., Beijing 100102, PR China,School of pharmacy, Changsha Medical University, Changsha 410219, PR China,Corresponding authors at: Academician Workstation, Changsha Medical University, Changsha 410219, PR China.
| |
Collapse
|
24
|
Feng X, Fang F, Long H, Zeng R, Yao Y. Single-cell RNA-seq data analysis using graph autoencoders and graph attention networks. Front Genet 2022; 13:1003711. [PMID: 36568390 PMCID: PMC9780469 DOI: 10.3389/fgene.2022.1003711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/21/2022] [Indexed: 12/13/2022] Open
Abstract
With the development of high-throughput sequencing technology, the scale of single-cell RNA sequencing (scRNA-seq) data has surged. Its data are typically high-dimensional, with high dropout noise and high sparsity. Therefore, gene imputation and cell clustering analysis of scRNA-seq data is increasingly important. Statistical or traditional machine learning methods are inefficient, and improved accuracy is needed. The methods based on deep learning cannot directly process non-Euclidean spatial data, such as cell diagrams. In this study, we developed scGAEGAT, a multi-modal model with graph autoencoders and graph attention networks for scRNA-seq analysis based on graph neural networks. Cosine similarity, median L1 distance, and root-mean-squared error were used to measure the gene imputation performance of different methods for comparison with scGAEGAT. Furthermore, adjusted mutual information, normalized mutual information, completeness score, and Silhouette coefficient score were used to measure the cell clustering performance of different methods for comparison with scGAEGAT. Experimental results demonstrated promising performance of the scGAEGAT model in gene imputation and cell clustering prediction on four scRNA-seq data sets with gold-standard cell labels.
Collapse
Affiliation(s)
- Xiang Feng
- College of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Fang Fang
- College of Information Engineering, Hainan Vocational University of Science and Technology, Haikou, Hainan, China
| | - Haixia Long
- College of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Rao Zeng
- College of Information Science Technology, Hainan Normal University, Haikou, Hainan, China
| | - Yuhua Yao
- College of Mathematics and Statistics, Hainan Normal University, Haikou, Hainan, China
| |
Collapse
|
25
|
Wang Y, Xiang J, Liu C, Tang M, Hou R, Bao M, Tian G, He J, He B. Drug repositioning for SARS-CoV-2 by Gaussian kernel similarity bilinear matrix factorization. Front Microbiol 2022; 13:1062281. [PMID: 36545200 PMCID: PMC9762482 DOI: 10.3389/fmicb.2022.1062281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/21/2022] [Indexed: 12/12/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19), a disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is currently spreading rapidly around the world. Since SARS-CoV-2 seriously threatens human life and health as well as the development of the world economy, it is very urgent to identify effective drugs against this virus. However, traditional methods to develop new drugs are costly and time-consuming, which makes drug repositioning a promising exploration direction for this purpose. In this study, we collected known antiviral drugs to form five virus-drug association datasets, and then explored drug repositioning for SARS-CoV-2 by Gaussian kernel similarity bilinear matrix factorization (VDA-GKSBMF). By the 5-fold cross-validation, we found that VDA-GKSBMF has an area under curve (AUC) value of 0.8851, 0.8594, 0.8807, 0.8824, and 0.8804, respectively, on the five datasets, which are higher than those of other state-of-art algorithms in four datasets. Based on known virus-drug association data, we used VDA-GKSBMF to prioritize the top-k candidate antiviral drugs that are most likely to be effective against SARS-CoV-2. We confirmed that the top-10 drugs can be molecularly docked with virus spikes protein/human ACE2 by AutoDock on five datasets. Among them, four antiviral drugs ribavirin, remdesivir, oseltamivir, and zidovudine have been under clinical trials or supported in recent literatures. The results suggest that VDA-GKSBMF is an effective algorithm for identifying potential antiviral drugs against SARS-CoV-2.
Collapse
Affiliation(s)
- Yibai Wang
- School of Information Engineering, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Information Engineering, Changsha Medical University, Changsha, China,Academician Workstation, Changsha Medical University, Changsha, China,*Correspondence: Ju Xiang,
| | - Cuicui Liu
- School of Information Engineering, Changsha Medical University, Changsha, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Rui Hou
- Geneis (Beijing) Co., Ltd., Beijing, China,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Meihua Bao
- School of Pharmacy, Changsha Medical University, Changsha, China,Key Laboratory Breeding Base of Hunan Oriented Fundamental and Applied Research of Innovative Pharmaceutics, Changsha Medical University, Changsha, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing, China,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jianjun He
- Academician Workstation, Changsha Medical University, Changsha, China,School of Pharmacy, Changsha Medical University, Changsha, China,Key Laboratory Breeding Base of Hunan Oriented Fundamental and Applied Research of Innovative Pharmaceutics, Changsha Medical University, Changsha, China,Jianjun He,
| | - Binsheng He
- Academician Workstation, Changsha Medical University, Changsha, China,School of Pharmacy, Changsha Medical University, Changsha, China,Key Laboratory Breeding Base of Hunan Oriented Fundamental and Applied Research of Innovative Pharmaceutics, Changsha Medical University, Changsha, China,Binsheng He,
| |
Collapse
|
26
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine learning-based methods for RNA data analysis—Volume II. Front Genet 2022; 13:1010089. [DOI: 10.3389/fgene.2022.1010089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open
|
27
|
Yao Y, Lv Y, Tong L, Liang Y, Xi S, Ji B, Zhang G, Li L, Tian G, Tang M, Hu X, Li S, Yang J. ICSDA: a multi-modal deep learning model to predict breast cancer recurrence and metastasis risk by integrating pathological, clinical and gene expression data. Brief Bioinform 2022; 23:6761046. [PMID: 36242564 DOI: 10.1093/bib/bbac448] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Breast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data. Specifically, we segmented tumor regions in H&E into image blocks (256 × 256 pixels) and encoded each image block into a 1D feature vector using a deep neural network. Then, the attention module scored each area of the H&E-stained images and combined image features with clinical and gene expression data to predict the risk of recurrence and metastasis for each patient. To test the model, we downloaded all 196 breast cancer samples from the Cancer Genome Atlas with clinical, gene expression and H&E information simultaneously available. The samples were then divided into the training and testing sets with a ratio of 7: 3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling. The multi-modal model achieved an area-under-the-curve value of 0.75 on the testing set better than those based solely on H&E image, sequencing data and clinical data, respectively. This study might have clinical significance in identifying high-risk breast cancer patients, who may benefit from postoperative adjuvant treatment.
Collapse
Affiliation(s)
- Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, China
| | - Yaping Lv
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Genies Beijing Co., Ltd., Beijing 100102, China
| | - Ling Tong
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Yuebin Liang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Shuxue Xi
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Binbin Ji
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Guanglu Zhang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China
| | - Ling Li
- Basic Courses Department, Zhejiang Shuren University, Hangzhou 310000, China
| | - Geng Tian
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, 212013, China
| | - Xiyue Hu
- Dept. of Colorectal Surgery, National Cancer Center/ Cancer Hospital, Chinese Academy of Medical Science, 17 Panjiayuan Nanli, Chaoyang District, Beijing, China, 100021
| | - Shijun Li
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Jialiang Yang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
28
|
Nussinov R, Tsai CJ, Jang H. A New View of Activating Mutations in Cancer. Cancer Res 2022; 82:4114-4123. [PMID: 36069825 PMCID: PMC9664134 DOI: 10.1158/0008-5472.can-22-2125] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/16/2022] [Accepted: 09/01/2022] [Indexed: 12/14/2022]
Abstract
A vast effort has been invested in the identification of driver mutations of cancer. However, recent studies and observations call into question whether the activating mutations or the signal strength are the major determinant of tumor development. The data argue that signal strength determines cell fate, not the mutation that initiated it. In addition to activating mutations, factors that can impact signaling strength include (i) homeostatic mechanisms that can block or enhance the signal, (ii) the types and locations of additional mutations, and (iii) the expression levels of specific isoforms of genes and regulators of proteins in the pathway. Because signal levels are largely decided by chromatin structure, they vary across cell types, states, and time windows. A strong activating mutation can be restricted by low expression, whereas a weaker mutation can be strengthened by high expression. Strong signals can be associated with cell proliferation, but too strong a signal may result in oncogene-induced senescence. Beyond cancer, moderate signal strength in embryonic neural cells may be associated with neurodevelopmental disorders, and moderate signals in aging may be associated with neurodegenerative diseases, like Alzheimer's disease. The challenge for improving patient outcomes therefore lies in determining signaling thresholds and predicting signal strength.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, NCI, Frederick, Maryland
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, NCI, Frederick, Maryland
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Cancer Innovation Laboratory, NCI, Frederick, Maryland
| |
Collapse
|
29
|
Huang K, Lin B, Liu J, Liu Y, Li J, Tian G, Yang J. Predicting colorectal cancer tumor mutational burden from histopathological images and clinical information using multi-modal deep learning. Bioinformatics 2022; 38:5108-5115. [PMID: 36130268 DOI: 10.1093/bioinformatics/btac641] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 08/31/2022] [Accepted: 09/20/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Tumor mutational burden (TMB) is an indicator of the efficacy and prognosis of immune checkpoint therapy in colorectal cancer (CRC). In general, patients with higher TMB values are more likely to benefit from immunotherapy. Though whole-exome sequencing is considered the gold standard for determining TMB, it is difficult to be applied in clinical practice due to its high cost. There are also a few DNA panel-based methods to estimate TMB; however, their detection cost is also high, and the associated wet-lab experiments usually take days, which emphasize the need for faster and cheaper alternatives. RESULTS In this study, we propose a multi-modal deep learning model based on a residual network (ResNet) and multi-modal compact bilinear pooling to predict TMB status (i.e. TMB high (TMB_H) or TMB low(TMB_L)) directly from histopathological images and clinical data. We applied the model to CRC data from The Cancer Genome Atlas and compared it with four other popular methods, namely, ResNet18, ResNet50, VGG19 and AlexNet. We tested different TMB thresholds, namely, percentiles of 10%, 14.3%, 15%, 16.3%, 20%, 30% and 50%, to differentiate TMB_H and TMB_L.For the percentile of 14.3% (i.e. TMB value 20) and ResNet18, our model achieved an area under the receiver operating characteristic curve of 0.817 after 5-fold cross-validation, which was better than that of other compared models. In addition, we also found that TMB values were significantly associated with the tumor stage and N and M stages. Our study shows that deep learning models can predict TMB status from histopathological images and clinical information only, which is worth clinical application.
Collapse
Affiliation(s)
- Kaimei Huang
- Department of Mathematics, Zhejiang Normal University, Jinghua 321004, China.,Department of Sciences, Geneis (Beijing) Co., Ltd, Beijing 100102, China.,Department of Sciences, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Binghu Lin
- Department of General Surgery of Third Ward, Xiangyang No.1 People's Hospital, Hubei University of Medicine, Xiangyang 441000, China
| | - Jinyang Liu
- Department of Sciences, Geneis (Beijing) Co., Ltd, Beijing 100102, China.,Department of Sciences, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Yankun Liu
- Cancer Institute, Tangshan People's Hospital, Tangshan 063001, China
| | - Jingwu Li
- Cancer Institute, Tangshan People's Hospital, Tangshan 063001, China
| | - Geng Tian
- Department of Sciences, Geneis (Beijing) Co., Ltd, Beijing 100102, China.,Department of Sciences, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Jialiang Yang
- Department of Sciences, Geneis (Beijing) Co., Ltd, Beijing 100102, China.,Department of Sciences, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
30
|
Li S, Yang M, Ji L, Fan H. A multi-omics machine learning framework in predicting the recurrence and metastasis of patients with pancreatic adenocarcinoma. Front Microbiol 2022; 13:1032623. [PMID: 36406449 PMCID: PMC9669652 DOI: 10.3389/fmicb.2022.1032623] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/17/2022] [Indexed: 10/15/2023] Open
Abstract
Local recurrence and distant metastasis are the main causes of death in patients with pancreatic adenocarcinoma (PDAC). Microbial content in PDAC metastasis is still not well-characterized. Here, the tissue microbiome was comprehensively compared between metastatic and non-metastatic PDAC patients. We found that the pancreatic tissue microbiome of metastatic patients was significantly different from that of non-metastatic patients. Further, 10 potential bacterial biomarkers (Kurthia, Gulbenkiania, Acetobacterium and Planctomyces etc.) were identified by differential analysis. Meanwhile, significant differences in expression patterns across multiple omics (lncRNA, miRNA, and mRNA) of PDAC patients were found. The highest accuracy was achieved when these 10 bacterial biomarkers were used as features to predict recurrence or metastasis in PDAC patients, with an AUC of 0.815. Finally, the recurrence and metastasis in PDAC patients were associated with reduced survival and this association was potentially driven by the 10 biomarkers we identified. Our studies highlight the association between the tissue microbiome and recurrence or metastasis of pancreatic adenocarcioma patients, as well as the survival of patients.
Collapse
Affiliation(s)
- Shenming Li
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
- Department of Nephrology, Essen University Hospital, University of Duisburg-Essen, Essen, Germany
| | - Min Yang
- School of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan, Anhui, China
- Genesis Beijing Co., Ltd., Beijing, China
| | - Lei Ji
- Genesis Beijing Co., Ltd., Beijing, China
| | - Hua Fan
- Department of Hepatobiliary and Pancreaticosplenic Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
31
|
Zhai S, Li X, Wu Y, Shi X, Ji B, Qiu C. Identifying potential microRNA biomarkers for colon cancer and colorectal cancer through bound nuclear norm regularization. Front Genet 2022; 13:980437. [PMID: 36313468 PMCID: PMC9614659 DOI: 10.3389/fgene.2022.980437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022] Open
Abstract
Colon cancer and colorectal cancer are two common cancer-related deaths worldwide. Identification of potential biomarkers for the two cancers can help us to evaluate their initiation, progression and therapeutic response. In this study, we propose a new microRNA-disease association identification method, BNNRMDA, to discover potential microRNA biomarkers for the two cancers. BNNRMDA better combines disease semantic similarity and Gaussian Association Profile Kernel (GAPK) similarity, microRNA function similarity and GAPK similarity, and the bound nuclear norm regularization model. Compared to other five classical microRNA-disease association identification methods (MIDPE, MIDP, RLSMDA, GRNMF, AND LPLNS), BNNRMDA obtains the highest AUC of 0.9071, demonstrating its strong microRNA-disease association identification performance. BNNRMDA is applied to discover possible microRNA biomarkers for colon cancer and colorectal cancer. The results show that all 73 known microRNAs associated with colon cancer in the HMDD database have the highest association scores with colon cancer and are ranked as top 73. Among 137 known microRNAs associated with colorectal cancer in the HMDD database, 129 microRNAs have the highest association scores with colorectal cancer and are ranked as top 129. In addition, we predict that hsa-miR-103a could be a potential biomarker of colon cancer and hsa-mir-193b and hsa-mir-7days could be potential biomarkers of colorectal cancer.
Collapse
Affiliation(s)
- Shengyong Zhai
- Department of General Surgery, Weifang People’s Hospital, Shandong, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China,Heilongjiang Second Cancer Hospital, Harbin, China
| | - Yan Wu
- Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoli Shi
- Geneis Beijing Co., Ltd., Beijing, China
| | - Binbin Ji
- Geneis Beijing Co., Ltd., Beijing, China
| | - Chun Qiu
- Department of Oncology, Hainan General Hospital, Haikou, China,*Correspondence: Chun Qiu,
| |
Collapse
|
32
|
Su Q, Tan Q, Liu X, Wu L. Prioritizing potential circRNA biomarkers for bladder cancer and bladder urothelial cancer based on an ensemble model. Front Genet 2022; 13:1001608. [PMID: 36186429 PMCID: PMC9521272 DOI: 10.3389/fgene.2022.1001608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open
Abstract
Bladder cancer is the most common cancer of the urinary system. Bladder urothelial cancer accounts for 90% of bladder cancer. These two cancers have high morbidity and mortality rates worldwide. The identification of biomarkers for bladder cancer and bladder urothelial cancer helps in their diagnosis and treatment. circRNAs are considered oncogenes or tumor suppressors in cancers, and they play important roles in the occurrence and development of cancers. In this manuscript, we developed an Ensemble model, CDA-EnRWLRLS, to predict circRNA-Disease Associations (CDA) combining Random Walk with restart and Laplacian Regularized Least Squares, and further screen potential biomarkers for bladder cancer and bladder urothelial cancer. First, we compute disease similarity by combining the semantic similarity and association profile similarity of diseases and circRNA similarity by combining the functional similarity and association profile similarity of circRNAs. Second, we score each circRNA-disease pair by random walk with restart and Laplacian regularized least squares, respectively. Third, circRNA-disease association scores from these models are integrated to obtain the final CDAs by the soft voting approach. Finally, we use CDA-EnRWLRLS to screen potential circRNA biomarkers for bladder cancer and bladder urothelial cancer. CDA-EnRWLRLS is compared to three classical CDA prediction methods (CD-LNLP, DWNN-RLS, and KATZHCDA) and two individual models (CDA-RWR and CDA-LRLS), and obtains better AUC of 0.8654. We predict that circHIPK3 has the highest association with bladder cancer and may be its potential biomarker. In addition, circSMARCA5 has the highest association with bladder urothelial cancer and may be its possible biomarker.
Collapse
|
33
|
Lu J, Tan J, Yu X. A Prognostic Ferroptosis-Related lncRNA Model Associated With Immune Infiltration in Colon Cancer. Front Genet 2022; 13:934196. [PMID: 36118850 PMCID: PMC9470855 DOI: 10.3389/fgene.2022.934196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 06/13/2022] [Indexed: 11/28/2022] Open
Abstract
Colon cancer (CC) is a common malignant tumor worldwide, and ferroptosis plays a vital role in the pathology and progression of CC. Effective prognostic tools are required to guide clinical decision-making in CC. In our study, gene expression and clinical data of CC were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. We identified the differentially expressed ferroptosis-related lncRNAs using the differential expression and gene co-expression analysis. Then, univariate and multivariate Cox regression analyses were used to identify the effective ferroptosis-related lncRNAs for constructing the prognostic model for CC. Gene set enrichment analysis (GSEA) was conducted to explore the functional enrichment analysis. CIBERSORT and single-sample GSEA were performed to investigate the association between our model and the immune microenvironment. Finally, three ferroptosis-related lncRNAs (XXbac-B476C20.9, TP73-AS1, and SNHG15) were identified to construct the prognostic model. The results of the validation showed that our model was effective in predicting the prognosis of CC patients, which also was an independent prognostic factor for CC. The GSEA analysis showed that several ferroptosis-related pathways were significantly enriched in the low-risk group. Immune infiltration analysis suggested that the level of immune cell infiltration was significantly higher in the high-risk group than that in the low-risk group. In summary, we established a prognostic model based on the ferroptosis-related lncRNAs, which could provide clinical guidance for future laboratory and clinical research on CC.
Collapse
|
34
|
Li L, Qiu W, Lin L, Liu J, Shi X, Shi Y. Predicting recurrence and metastasis risk of endometrial carcinoma via prognostic signatures identified from multi-omics data. Front Oncol 2022; 12:982452. [PMID: 36059678 PMCID: PMC9438970 DOI: 10.3389/fonc.2022.982452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectivesEndometrial carcinoma (EC) is one of the three major gynecological malignancies, in which 15% - 20% patients will have recurrence and metastasis. Though there are many studies on the prognosis on this cancer, the performances of existing models evaluating the risk of its recurrence and metastasis are yet to be improved. In addition, a comprehensive multi-omics analyses on the prognostic signatures of EC are on demand. In this study, we aimed to construct a relatively stable and reliable model for predicting recurrence and metastasis of EC. This will help determine the risk level of patients and choose appropriate adjuvant therapy, thereby avoiding improper treatment, and improving the prognosis of patients.MethodsThe mRNA, microRNA (miRNA), long non-coding RNA (lncRNA), copy number variation (CNV) data and clinical information of patients with EC were downloaded from The Cancer Genome Atlas (TCGA). Differential expression analyses were performed between the recurrence or metastasis group and the non-recurrence/metastasis group. Then, we screened potential prognostic markers from the four kinds of omics data respectively and established prediction models using three classifiers.ResultsWe achieved differential expressed mRNAs, lncRNAs, miRNAs and CNVs between the two groups. According to feature selection scores by the random forest algorithm, 275 CNV features, 50 lncRNA features, 150 miRNA features and 150 mRNA features were selected, respectively. And the prediction model constructed by the features of lncRNA data using random forest method showed the best performance, with an area under the curve of 0.763, and an accuracy of 0.819 under 10-fold cross-validation.ConclusionWe developed a computational model using omics information, which is able to predicting recurrence and metastasis risk of EC accurately.
Collapse
Affiliation(s)
- Ling Li
- Department of Gynecological Oncology Surgery, Fujian Cancer Hospital, Fujian Medical University Cancer Hospital, Fuzhou, China
| | - Wenjing Qiu
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Liang Lin
- Department of Gynecological Oncology Surgery, Fujian Cancer Hospital, Fujian Medical University Cancer Hospital, Fuzhou, China
| | - Jinyang Liu
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Xiaoli Shi
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- *Correspondence: Yi Shi, ; Xiaoli Shi,
| | - Yi Shi
- Department of Molecular Pathology, Fujian Cancer Hospital, Fujian Medical University Cancer Hospital, Fuzhou, China
- *Correspondence: Yi Shi, ; Xiaoli Shi,
| |
Collapse
|
35
|
Li L, Chen F, Liu J, Zhu W, Lin L, Chen L, Shi Y, Lin A, Chen G. Molecular classification grade 3 endometrial endometrioid carcinoma using a next-generation sequencing–based gene panel. Front Oncol 2022; 12:935694. [PMID: 36003784 PMCID: PMC9394115 DOI: 10.3389/fonc.2022.935694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 06/30/2022] [Indexed: 11/13/2022] Open
Abstract
Over the past two decades, the incidence of endometrial cancer (EC) is increasing, and there is a need for molecular biomarkers to predict prognosis and guide treatment. A recent study from The Cancer Genome Atlas suggested to implement the EC analysis by molecular profile for improving diagnosis, prognosis, and therapeutic treatment. In this study, next-generation sequencing was performed on 70 cases of G3 endometrioid ECs (EECs) using an 11-gene panel (TP53, MLH1, MSH2, MSH6, PMS2, EPCAM, PIK3CA, CTNNB1, KRAS, PTEN, and POL) for molecular classification. The molecular classification based on the 11-gene NGS panel identified four molecular subgroups: POLE-ultramutated (n = 20, 28.6%), MSI-H (n = 27, 38.6%), NSMP (n = 13, 18.6%) and TP53mut (n = 10, 14.3%). The NGS method showed 98.6% (69 of 70 cases, kappa value 98%) in concordance with the cases assessed by immunohistochemistry (IHC). Among the seven dead cases, four were MSI-H tumors, two were TP53mut/p53abn tumors, and one was NSMP tumors with an average overall survival (OS) of 14.7 months. TP53mut subgroup showed that poor OS rates and POLE group have favorable prognosis. Our work suggested that the 11-gene panel is suitable for molecular classification in G3 EECs and for guiding prognosis and treatment decisions.
Collapse
Affiliation(s)
- Ling Li
- Department of Gynecological Oncology Surgery, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Fangfang Chen
- Department of Molecular pathology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Jingcheng Liu
- Department of Pathology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Weifeng Zhu
- Department of Pathology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Liang Lin
- Department of Gynecological Oncology Surgery, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Li Chen
- Department of Gynecological Oncology Surgery, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Yi Shi
- Department of Molecular pathology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - An Lin
- Department of Gynecological Oncology Surgery, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Gang Chen
- Department of Pathology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
- *Correspondence: Gang Chen,
| |
Collapse
|
36
|
Guo Z, Hui Y, Kong F, Lin X. Finding Lung-Cancer-Related lncRNAs Based on Laplacian Regularized Least Squares With Unbalanced Bi-Random Walk. Front Genet 2022; 13:933009. [PMID: 35938010 PMCID: PMC9355720 DOI: 10.3389/fgene.2022.933009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/03/2022] [Indexed: 11/13/2022] Open
Abstract
Lung cancer is one of the leading causes of cancer-related deaths. Thus, it is important to find its biomarkers. Furthermore, there is an increasing number of studies reporting that long noncoding RNAs (lncRNAs) demonstrate dense linkages with multiple human complex diseases. Inferring new lncRNA-disease associations help to identify potential biomarkers for lung cancer and further understand its pathogenesis, design new drugs, and formulate individualized therapeutic options for lung cancer patients. This study developed a computational method (LDA-RLSURW) by integrating Laplacian regularized least squares and unbalanced bi-random walk to discover possible lncRNA biomarkers for lung cancer. First, the lncRNA and disease similarities were computed. Second, unbalanced bi-random walk was, respectively, applied to the lncRNA and disease networks to score associations between diseases and lncRNAs. Third, Laplacian regularized least squares were further used to compute the association probability between each lncRNA-disease pair based on the computed random walk scores. LDA-RLSURW was compared using 10 classical LDA prediction methods, and the best AUC value of 0.9027 on the lncRNADisease database was obtained. We found the top 30 lncRNAs associated with lung cancers and inferred that lncRNAs TUG1, PTENP1, and UCA1 may be biomarkers of lung neoplasms, non-small–cell lung cancer, and LUAD, respectively.
Collapse
|
37
|
Lung Cancer Stage Prediction Using Multi-Omics Data. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:2279044. [PMID: 35880092 PMCID: PMC9308511 DOI: 10.1155/2022/2279044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 06/27/2022] [Indexed: 12/24/2022]
Abstract
Lung cancer is one of the leading causes of cancer death. Patients with early-stage lung cancer can be treated by surgery, while patients in the middle and late stages need chemotherapy or radiotherapy. Therefore, accurate staging of lung cancer is crucial for doctors to formulate accurate treatment plans for patients. In this paper, the random forest algorithm is used as the lung cancer stage prediction model, and the accuracy of lung cancer stage prediction is discussed in the microbiome, transcriptome, microbe, and transcriptome fusion groups, and the accuracy of the model is measured by indicators such as ACC, recall, and precision. The results showed that the prediction accuracy of microbial combinatorial transcriptome fusion analysis was the highest, reaching 0.809. The study reveals the role of multimodal data and fusion algorithm in accurately diagnosing lung cancer stage, which could aid doctors in clinics.
Collapse
|
38
|
Qiu W, Yang J, Wang B, Yang M, Tian G, Wang P, Yang J. Evaluating the Microsatellite Instability of Colorectal Cancer Based on Multimodal Deep Learning Integrating Histopathological and Molecular Data. Front Oncol 2022; 12:925079. [PMID: 35865460 PMCID: PMC9295995 DOI: 10.3389/fonc.2022.925079] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/30/2022] [Indexed: 11/14/2022] Open
Abstract
Microsatellite instability (MSI), an important biomarker for immunotherapy and the diagnosis of Lynch syndrome, refers to the change of microsatellite (MS) sequence length caused by insertion or deletion during DNA replication. However, traditional wet-lab experiment-based MSI detection is time-consuming and relies on experimental conditions. In addition, a comprehensive study on the associations between MSI status and various molecules like mRNA and miRNA has not been performed. In this study, we first studied the association between MSI status and several molecules including mRNA, miRNA, lncRNA, DNA methylation, and copy number variation (CNV) using colorectal cancer data from The Cancer Genome Atlas (TCGA). Then, we developed a novel deep learning framework to predict MSI status based solely on hematoxylin and eosin (H&E) staining images, and combined the H&E image with the above-mentioned molecules by multimodal compact bilinear pooling. Our results showed that there were significant differences in mRNA, miRNA, and lncRNA between the high microsatellite instability (MSI-H) patient group and the low microsatellite instability or microsatellite stability (MSI-L/MSS) patient group. By using the H&E image alone, one can predict MSI status with an acceptable prediction area under the curve (AUC) of 0.809 in 5-fold cross-validation. The fusion models integrating H&E image with a single type of molecule have higher prediction accuracies than that using H&E image alone, with the highest AUC of 0.952 achieved when combining H&E image with DNA methylation data. However, prediction accuracy will decrease when combining H&E image with all types of molecular data. In conclusion, combining H&E image with deep learning can predict the MSI status of colorectal cancer, the accuracy of which can further be improved by integrating appropriate molecular data. This study may have clinical significance in practice.
Collapse
Affiliation(s)
- Wenjing Qiu
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, China
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Jiasheng Yang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, China
| | - Bing Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, China
| | - Min Yang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, China
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
- Qingdao Genesis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Peizhen Wang
- School of Electrical and Information Engineering, Anhui University of Technology, Maanshan, China
- *Correspondence: Peizhen Wang, ; Jialiang Yang,
| | - Jialiang Yang
- Science System Department, Geneis Beijing Co., Ltd., Beijing, China
- Qingdao Genesis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- *Correspondence: Peizhen Wang, ; Jialiang Yang,
| |
Collapse
|
39
|
Liu G, Li M, Wang H, Lin S, Xu J, Li R, Tang M, Li C. D3K: The Dissimilarity-Density-Dynamic Radius K-means Clustering Algorithm for scRNA-Seq Data. Front Genet 2022; 13:912711. [PMID: 35846121 PMCID: PMC9284269 DOI: 10.3389/fgene.2022.912711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 04/25/2022] [Indexed: 12/02/2022] Open
Abstract
A single-cell sequencing data set has always been a challenge for clustering because of its high dimension and multi-noise points. The traditional K-means algorithm is not suitable for this type of data. Therefore, this study proposes a Dissimilarity-Density-Dynamic Radius-K-means clustering algorithm. The algorithm adds the dynamic radius parameter to the calculation. It flexibly adjusts the active radius according to the data characteristics, which can eliminate the influence of noise points and optimize the clustering results. At the same time, the algorithm calculates the weight through the dissimilarity density of the data set, the average contrast of candidate clusters, and the dissimilarity of candidate clusters. It obtains a set of high-quality initial center points, which solves the randomness of the K-means algorithm in selecting the center points. Finally, compared with similar algorithms, this algorithm shows a better clustering effect on single-cell data. Each clustering index is higher than other single-cell clustering algorithms, which overcomes the shortcomings of the traditional K-means algorithm.
Collapse
Affiliation(s)
- Guoyun Liu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Manzhi Li
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, China
- *Correspondence: Manzhi Li,
| | - Hongtao Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Shijun Lin
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Junlin Xu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Ruixi Li
- Geneis Beijing Co., Ltd., Beijing, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| | - Chun Li
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
40
|
Xu J, Cui L, Zhuang J, Meng Y, Bing P, He B, Tian G, Kwok Pui C, Wu T, Wang B, Yang J. Evaluating the performance of dropout imputation and clustering methods for single-cell RNA sequencing data. Comput Biol Med 2022; 146:105697. [PMID: 35697529 DOI: 10.1016/j.compbiomed.2022.105697] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/16/2022] [Accepted: 06/04/2022] [Indexed: 11/03/2022]
Abstract
Recent advances in single-cell RNA sequencing (scRNA-seq) provide exciting opportunities for transcriptome analysis at single-cell resolution. Clustering individual cells is a key step to reveal cell subtypes and infer cell lineage in scRNA-seq analysis. Although many dedicated algorithms have been proposed, clustering quality remains a computational challenge for scRNA-seq data, which is exacerbated by inflated zero counts due to various technical noise. To address this challenge, we assess the combinations of nine popular dropout imputation methods and eight clustering methods on a collection of 10 well-annotated scRNA-seq datasets with different sample sizes. Our results show that (i) imputation algorithms do typically improve the performance of clustering methods, and the quality of data visualization using t-Distributed Stochastic Neighbor Embedding; and (ii) the performance of a particular combination of imputation and clustering methods varies with dataset size. For example, the combination of single-cell analysis via expression recovery and Sparse Subspace Clustering (SSC) methods usually works well on smaller datasets, while the combination of adaptively-thresholded low-rank approximation and single-cell interpretation via multikernel learning (SIMLR) usually achieves the best performance on larger datasets.
Collapse
Affiliation(s)
- Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Lingyu Cui
- College of Life Science, Northeast Forestry University, Harbin, Heilongjiang, 150000, China
| | - Jujuan Zhuang
- School of Science, Dalian Maritime University, Dalian, Liaoning, 116026, China
| | - Yajie Meng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha, 410219, China
| | - Binsheng He
- Academician Workstation, Changsha Medical University, Changsha, 410219, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, 100102, China; Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, China
| | - Choi Kwok Pui
- Department of Statistics and Data Science, Department of Mathematics, National University of Singapore, Singapore, 117546, Republic of Singapore
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Bing Wang
- School of Electrical & Information Engineering, Anhui University of Technology, Anhui, 243002, China.
| | - Jialiang Yang
- Geneis Beijing Co., Ltd., Beijing, 100102, China; Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, 266000, China.
| |
Collapse
|
41
|
Liu X, Yuan P, Li R, Zhang D, An J, Ju J, Liu C, Ren F, Hou R, Li Y, Yang J. Predicting breast cancer recurrence and metastasis risk by integrating color and texture features of histopathological images and machine learning technologies. Comput Biol Med 2022; 146:105569. [DOI: 10.1016/j.compbiomed.2022.105569] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/24/2022] [Accepted: 04/25/2022] [Indexed: 12/11/2022]
|
42
|
Li S, Wang B, Chang M, Hou R, Tian G, Tong L. A Novel Algorithm for Detecting Microsatellite Instability Based on Next-Generation Sequencing Data. Front Oncol 2022; 12:916379. [PMID: 35847873 PMCID: PMC9280483 DOI: 10.3389/fonc.2022.916379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 05/27/2022] [Indexed: 11/25/2022] Open
Abstract
Objectives Microsatellite instability (MSI) is the condition of genetic hypermutability caused by spontaneous acquisition or loss of nucleotides during the DNA replication. MSI has been discovered to be a useful immunotherapy biomarker clinically. The main DNA-based method for MSI detection is polymerase chain reaction (PCR) amplification and fragment length analysis, which are costly and laborious. Thus, we developed a novel method to detect MSI based on next-generation sequencing (NGS) data. Methods We chose six markers of MSI. After alignment and reads counting, a histogram was plotted showing the counts of different lengths for each marker. We then designed an algorithm to discover peaks in the generated histograms so that the peak numbers discovered in NGS data resembled that in PCR-based method. Results We selected nine samples as the training dataset, 101 samples for validation, and 68 samples as the test dataset from Chifeng Municipal Hospital, Inner Mongolia, China. The NGS-based method achieved 100% accuracy for the validation dataset and 98.53% accuracy for the test dataset, in which only one false positive was detected. Conclusions Accurate MSI judgments were achieved using NGS data, which could provide comparable MSI detection with the gold standard, PCR-based methods.
Collapse
Affiliation(s)
- Shijun Li
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
| | - Bo Wang
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Miaomiao Chang
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
| | - Rui Hou
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
- *Correspondence: Geng Tian, ; Ling Tong,
| | - Ling Tong
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
- *Correspondence: Geng Tian, ; Ling Tong,
| |
Collapse
|
43
|
Niu Y, Wang L, Zhang X, Han Y, Yang C, Bai H, Huang K, Ren C, Tian G, Yin S, Zhao Y, Wang Y, Shi X, Zhang M. Predicting Tumor Mutational Burden From Lung Adenocarcinoma Histopathological Images Using Deep Learning. Front Oncol 2022; 12:927426. [PMID: 35756617 PMCID: PMC9213738 DOI: 10.3389/fonc.2022.927426] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 05/16/2022] [Indexed: 11/25/2022] Open
Abstract
Tumor mutation burden (TMB) is an important biomarker for tumor immunotherapy. It plays an important role in the clinical treatment process, but the gold standard measurement of TMB is based on whole exome sequencing (WES). WES cannot be done in most hospitals due to its high cost, long turnaround times and operational complexity. To seek out a better method to evaluate TMB, we divided the patients with lung adenocarcinoma (LUAD) in TCGA into two groups according to the TMB value, then analyzed the differences of clinical characteristics and gene expression between the two groups. We further explored the possibility of using histopathological images to predict TMB status, and developed a deep learning model to predict TMB based on histopathological images of LUAD. In the 5-fold cross-validation, the area under the receiver operating characteristic (ROC) curve (AUC) of the model was 0.64. This study showed that it is possible to use deep learning to predict genomic features from histopathological images, though the prediction accuracy was relatively low. The study opens up a new way to explore the relationship between genes and phenotypes.
Collapse
Affiliation(s)
- Yi Niu
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | | | - Xiaojie Zhang
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | - Yu Han
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | - Chunjie Yang
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | - Henan Bai
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | | | | | - Geng Tian
- Geneis Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Shengjie Yin
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | - Yan Zhao
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| | - Ying Wang
- Department of Oncology, Inner Mongolia Medical University, Hohhot, China
| | - Xiaoli Shi
- Geneis Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Minghui Zhang
- Department of Oncology, Municipal Hospital of Chifeng, Chifeng, China
| |
Collapse
|
44
|
Ye T, Lin L, Cao L, Huang W, Wei S, Shan Y, Zhang Z. Novel Prognostic Signatures of Hepatocellular Carcinoma Based on Metabolic Pathway Phenotypes. Front Oncol 2022; 12:863266. [PMID: 35677150 PMCID: PMC9168273 DOI: 10.3389/fonc.2022.863266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 04/06/2022] [Indexed: 12/03/2022] Open
Abstract
Hepatocellular carcinoma is a disastrous cancer with an aberrant metabolism. In this study, we aimed to assess the role of metabolism in the prognosis of hepatocellular carcinoma. Ten metabolism-related pathways were identified to classify the hepatocellular carcinoma into two clusters: Metabolism_H and Metabolism_L. Compared with Metabolism_L, patients in Metabolism_H had lower survival rates with more mutated TP53 genes and more immune infiltration. Moreover, risk scores for predicting overall survival based on eleven differentially expressed metabolic genes were developed by the least absolute shrinkage and selection operator (LASSO)-Cox regression model in The Cancer Genome Atlas (TCGA) dataset, which was validated in the International Cancer Genome Consortium (ICGC) dataset. The immunohistochemistry staining of liver cancer patient specimens also identified that the 11 genes were associated with the prognosis of liver cancer patients. Multivariate Cox regression analyses indicated that the differentially expressed metabolic gene-based risk score was also an independent prognostic factor for overall survival. Furthermore, the risk score (AUC = 0.767) outperformed other clinical variables in predicting overall survival. Therefore, the metabolism-related survival-predictor model may predict overall survival excellently for HCC patients.
Collapse
Affiliation(s)
- Tingbo Ye
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.,Key Laboratory of Diagnosis and Treatment of Severe Hepato-Pancreatic Diseases of Zhejiang Province, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Leilei Lin
- Department of Ultrasound, Wenzhou People's Hospital, Wenzhou, China
| | - Lulu Cao
- Department of Pathology, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, China
| | - Weiguo Huang
- Department of Vascular Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Shengzhe Wei
- Department of Hand Surgery and Peripheral Neurosurgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Yunfeng Shan
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Zhongjing Zhang
- Department of Vascular Surgery, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
45
|
Xu J, Meng Y, Peng L, Cai L, Tang X, Liang Y, Tian G, Yang J. Computational drug repositioning using similarity constrained weight regularization matrix factorization: A case of COVID-19. J Cell Mol Med 2022; 26:3772-3782. [PMID: 35644992 PMCID: PMC9258716 DOI: 10.1111/jcmm.17412] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 02/03/2022] [Accepted: 05/11/2022] [Indexed: 02/06/2023] Open
Abstract
Amid the COVID‐19 crisis, we put sizeable efforts to collect a high number of experimentally validated drug–virus association entries from literature by text mining and built a human drug–virus association database. To the best of our knowledge, it is the largest publicly available drug–virus database so far. Next, we develop a novel weight regularization matrix factorization approach, termed WRMF, for in silico drug repurposing by integrating three networks: the known drug–virus association network, the drug–drug chemical structure similarity network, and the virus–virus genomic sequencing similarity network. Specifically, WRMF adds a weight to each training sample for reducing the influence of negative samples (i.e. the drug–virus association is unassociated). A comparison on the curated drug–virus database shows that WRMF performs better than a few state‐of‐the‐art methods. In addition, we selected the other two different public datasets (i.e. Cdataset and HMDD V2.0) to assess WRMF's performance. The case study also demonstrated the accuracy and reliability of WRMF to infer potential drugs for the novel virus. In summary, we offer a useful tool including a novel drug–virus association database and a powerful method WRMF to repurpose potential drugs for new viruses.
Collapse
Affiliation(s)
- Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yajie Meng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lijun Cai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xianfang Tang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, China
| | | |
Collapse
|
46
|
Peng L, Yang J, Wang M, Zhou L. Editorial: Machine Learning-Based Methods for RNA Data Analysis. Front Genet 2022; 13:828575. [PMID: 35692815 PMCID: PMC9175173 DOI: 10.3389/fgene.2022.828575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Affiliation(s)
- Lihong Peng
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
- School of Computer, Hunan University of Technology, Zhuzhou, China
| | | | - Minxian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Liqian Zhou
- College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China
- *Correspondence: Liqian Zhou,
| |
Collapse
|
47
|
Liu Y, Huang K, Yang Y, Wu Y, Gao W. Prediction of Tumor Mutation Load in Colorectal Cancer Histopathological Images Based on Deep Learning. Front Oncol 2022; 12:906888. [PMID: 35686098 PMCID: PMC9171017 DOI: 10.3389/fonc.2022.906888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 04/18/2022] [Indexed: 02/05/2023] Open
Abstract
Colorectal cancer (CRC) is one of the most prevalent malignancies, and immunotherapy can be applied to CRC patients of all ages, while its efficacy is uncertain. Tumor mutational burden (TMB) is important for predicting the effect of immunotherapy. Currently, whole-exome sequencing (WES) is a standard method to measure TMB, but it is costly and inefficient. Therefore, it is urgent to explore a method to assess TMB without WES to improve immunotherapy outcomes. In this study, we propose a deep learning method, DeepHE, based on the Residual Network (ResNet) model. On images of tissue, DeepHE can efficiently identify and analyze characteristics of tumor cells in CRC to predict the TMB. In our study, we used ×40 magnification images and grouped them by patients followed by thresholding at the 10th and 20th quantiles, which significantly improves the performance. Also, our model is superior compared with multiple models. In summary, deep learning methods can explore the association between histopathological images and genetic mutations, which will contribute to the precise treatment of CRC patients.
Collapse
Affiliation(s)
- Yongguang Liu
- Department of Anorectal Surgery, Weifang People’s Hospital, Weifang, China
| | - Kaimei Huang
- Genies (Beijing) Co., Ltd., Beijing, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Yachao Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Yan Wu
- Genies (Beijing) Co., Ltd., Beijing, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Wei Gao
- Department of Internal Medicine-Oncology, Fujian Cancer Hospital and Fujian Medical University Cancer Hospital, Fuzhou, China
| |
Collapse
|
48
|
Xiao C, Dong T, Yang L, Jin L, Lin W, Zhang F, Han Y, Huang Z. Identification of Novel Immune Ferropotosis-Related Genes Associated With Clinical and Prognostic Features in Gastric Cancer. Front Oncol 2022; 12:904304. [PMID: 35664744 PMCID: PMC9157572 DOI: 10.3389/fonc.2022.904304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 04/19/2022] [Indexed: 12/08/2022] Open
Abstract
Background Gastric cancer (GC) is the fifth commonest cancer and the third commonest reason of death causing by cancer worldwide. Currently, tumor immunology and ferropotosis develop rapidly that has made gastric cancer be treated in new directions. So, finding the potential targets and prognostic biomarkers for immunotherapy combined with ferropotosis is urgent. Methods By mining TCGA, immune-related genes, ferropotosis-related genes and immune-ferropotosis-related differentially expressed genes (IFR-DEGs) were identified. The independent prognostic value of IFR-DEGs was determined by differential expression analysis, prognostic analysis, and univariate and lasso regression analysis. Then, based on the prognostic risk model, the correlation between IFR-DEGs and immune scores, immune checkpoints were evaluated. Besides, we predicted the response of high and low risk groups to drugs. Results A 15-gene prognostic feature was constructed. The high-risk group had a poorer prognosis than the low-risk group. High-risk group had higher level of Treg immune cell infiltration compared with that in the low-risk group, and the tumor purity, immune checkpoint PD-1 and CTLA4, and immunity in the high-risk group were higher than those in the low-risk group. These results indicate that immune ferropotosis-related genes migh be potential predictors of STAD's response to ICI immunotherapy biomarkers. In addition, the response of small molecule drugs such as Nilotini, Sunitinib, Imatinib, etc. for high and low risk groups was predicted. Conclusion IFRSig can be regarded as an independent prognostic feature and may estimate OS and clinical treatment response in patients with STAD. IFRSig also has important correlation with immune microenvironment. A new understanding of the immune-ferropotosis-related genes during the occurrence and development of STAD is provided in this study.
Collapse
Affiliation(s)
- Chen Xiao
- Department of Gastroenterology, Fuzhou Second Hospital Affiliated to Xiamen University, Fuzhou, China
| | - Tao Dong
- Department of Digestion, Yidu Central Hospital of Weifang, Weifang, China
| | - Linhui Yang
- Graduate School of Fujian Medical University, Fuzhou, China
| | - Liangzi Jin
- Institute of Medical Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Kunming, China
| | - Weiguo Lin
- Department of Gastroenterology, Fuzhou Second Hospital Affiliated to Xiamen University, Fuzhou, China
| | - Faqin Zhang
- Department of Gastroenterology, Fuzhou Second Hospital Affiliated to Xiamen University, Fuzhou, China
| | - Yuanyuan Han
- Institute of Medical Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Kunming, China
| | - Zhijian Huang
- Department of Breast Surgical Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| |
Collapse
|
49
|
Lu Q, Chen F, Li Q, Chen L, Tong L, Tian G, Zhou X. A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data. Front Oncol 2022; 12:832567. [PMID: 35530331 PMCID: PMC9071249 DOI: 10.3389/fonc.2022.832567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 03/21/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer of unknown primary site (CUP) is a heterogeneous group of cancers whose tissue of origin remains unknown after detailed investigation by conventional clinical methods. The number of CUP accounts for roughly 3%–5% of all human malignancies. CUP patients are usually treated with broad-spectrum chemotherapy, which often leads to a poor prognosis. Recent studies suggest that the treatment targeting the primary lesion of CUP will significantly improve the prognosis of the patient. Therefore, it is urgent to develop an efficient method to accurately detect tissue of origin of CUP in clinical cancer research. In this work, we developed a novel framework that uses Extreme Gradient Boosting (XGBoost) to trace the primary site of CUP based on microarray-based gene expression data. First, we downloaded the microarray-based gene expression profiles of 59,385 genes for 57,08 samples from The Cancer Genome Atlas (TCGA) and 6,364 genes for 3,101 samples from the Gene Expression Omnibus (GEO). Both data were divided into training and independent testing data with a ratio of 4:1. Then, we obtained in the training data 200 and 290 genes from TCGA and the GEO datasets, respectively, to train XGBoost models for the identification of the primary site of CUP. The overall 5-fold cross-validation accuracies of our methods were 96.9% and 95.3% on TCGA and GEO training datasets, respectively. Meanwhile, the macro-precision for the independent dataset reached 96.75% and 98.8% on, respectively, TCGA and GEO. Experimental results demonstrated that the XGBoost framework not only can reduce the cost of clinical cancer traceability but also has high efficiency, which might be useful in clinical usage.
Collapse
Affiliation(s)
- Qingfeng Lu
- Oncology Department, Daqing Oilfield General Hospital, Daqing, China
| | - Fengxia Chen
- Department of Thoracic Surgery, Hainan General Hospital, Haikou, China
| | - Qianyue Li
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Chen
- Department of Emergency, Qingdao Eighth People's Hospital, Qingdao, China
| | - Ling Tong
- Department of Pathology, Chifeng Municipal Hospital, Chifeng Clinical Medical School of Inner Mongolia Medical University, Chifeng, China
| | - Geng Tian
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiaohong Zhou
- Second Division of Cancer, Jiamusi Cancer Hospital, Jiamusi, China
| |
Collapse
|
50
|
Liu Q, Luo X, Li J, Wang G. scESI: evolutionary sparse imputation for single-cell transcriptomes from nearest neighbor cells. Brief Bioinform 2022; 23:6580519. [PMID: 35512331 DOI: 10.1093/bib/bbac144] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 03/14/2022] [Accepted: 03/31/2022] [Indexed: 02/01/2023] Open
Abstract
The ubiquitous dropout problem in single-cell RNA sequencing technology causes a large amount of data noise in the gene expression profile. For this reason, we propose an evolutionary sparse imputation (ESI) algorithm for single-cell transcriptomes, which constructs a sparse representation model based on gene regulation relationships between cells. To solve this model, we design an optimization framework based on nondominated sorting genetics. This framework takes into account the topological relationship between cells and the variety of gene expression to iteratively search the global optimal solution, thereby learning the Pareto optimal cell-cell affinity matrix. Finally, we use the learned sparse relationship model between cells to improve data quality and reduce data noise. In simulated datasets, scESI performed significantly better than benchmark methods with various metrics. By applying scESI to real scRNA-seq datasets, we discovered scESI can not only further classify the cell types and separate cells in visualization successfully but also improve the performance in reconstructing trajectories differentiation and identifying differentially expressed genes. In addition, scESI successfully recovered the expression trends of marker genes in stem cell differentiation and can discover new cell types and putative pathways regulating biological processes.
Collapse
Affiliation(s)
- Qiaoming Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ximei Luo
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|