1
|
Ren J, Gao Q, Zhou X, Chen L, Guo W, Feng K, Huang T, Cai YD. Identification of key gene expression associated with quality of life after recovery from COVID-19. Med Biol Eng Comput 2024; 62:1031-1048. [PMID: 38123886 DOI: 10.1007/s11517-023-02988-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023]
Abstract
Post-acute sequelae of COVID-19 (PASC) is a persistent complication of severe acute respiratory syndrome coronavirus 2 infection that includes symptoms, such as fatigue, cognitive impairment, and respiratory distress. These symptoms severely affect the quality of life of patients after their recovery from COVID-19. In this study, a group of machine learning algorithms analyzed the whole blood RNA-seq data from patients with different PASC levels. The purpose of this analysis was to identify the gene markers associated with PASC and the special expression patterns for different PASC levels. By comparing the quality of life of patients after the acute phase of COVID-19 and before the disease, samples in the dataset were divided into three groups, namely, "Better," "The Same," and "Worse." Each patient was represented by the expression levels of 58,929 genes. The machine learning-based workflow included six feature-ranking algorithms, incremental feature selection (IFS), and four classification algorithms. The feature ranking algorithms were in charge of assessing feature importance, whereas IFS with classification algorithms were used to extract essential genes and to construct efficient classifiers and classification rules. The expression of top genes in the results was associated with the immune response to viral infection, which is supported by the published literature. For example, patients with low CCDC18 expression and high CPED1 expression had good quality of life, whereas those with low CDC16 expression had poor quality of life.
Collapse
Affiliation(s)
- JingXin Ren
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Qian Gao
- Department of Pharmacy, Shanghai Children's Medical Center, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - XianChao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, 200030, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
2
|
Martucci LF, Eichler RA, Silva RN, Costa TJ, Tostes RC, Busatto GF, Seelaender MC, Duarte AJ, Souza HP, Ferro ES. Intracellular peptides in SARS-CoV-2-infected patients. iScience 2023; 26:107542. [PMID: 37636076 PMCID: PMC10448160 DOI: 10.1016/j.isci.2023.107542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/29/2023] [Accepted: 08/01/2023] [Indexed: 08/29/2023] Open
Abstract
Intracellular peptides (InPeps) generated by the orchestrated action of the proteasome and intracellular peptidases have biological and pharmacological significance. Here, human plasma relative concentration of specific InPeps was compared between 175 patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and 45 SARS-CoV-2 non-infected patients; 2,466 unique peptides were identified, of which 67% were InPeps. The results revealed differences of a specific group of peptides in human plasma comparing non-infected individuals to patients infected by SARS-CoV-2, following the results of the semi-quantitative analyses by isotope-labeled electrospray mass spectrometry. The protein-protein interactions networks enriched pathways, drawn by genes encoding the proteins from which the peptides originated, revealed the presence of the coronavirus disease/COVID-19 network solely in the group of patients fatally infected by SARS-CoV-2. Thus, modulation of the relative plasma levels of specific InPeps could be employed as a predictive tool for disease outcome.
Collapse
Affiliation(s)
- Luiz Felipe Martucci
- Department of Pharmacology, Biomedical Sciences Institute, São Paulo 05508-000, Brazil
| | | | - Renée N.O. Silva
- Department of Pharmacology, Biomedical Sciences Institute, São Paulo 05508-000, Brazil
| | - Tiago J. Costa
- Department of Pharmacology, Ribeirao Preto Medical School, Ribeirão Preto 14049-900, Brazil
| | - Rita C. Tostes
- Department of Pharmacology, Ribeirao Preto Medical School, Ribeirão Preto 14049-900, Brazil
| | - Geraldo F. Busatto
- Department of Psichiatry, Medical School and Hospital das Clínicas, University of São Paulo, 01246-903 SP, Brazil
| | - Marilia C.L. Seelaender
- Department of Surgery, Medical School and Hospital das Clínicas, University of São Paulo, 01246-903 SP, Brazil
| | - Alberto J.S. Duarte
- Department of Patology, Medical School and Hospital das Clínicas, University of São Paulo, 01246-903 SP, Brazil
| | - Heraldo P. Souza
- Department of Internal Medicine, Medical School and Hospital das Clínicas, University of São Paulo, 01246-903 SP, Brazil
| | - Emer S. Ferro
- Department of Pharmacology, Biomedical Sciences Institute, São Paulo 05508-000, Brazil
- Department of Patology, Medical School and Hospital das Clínicas, University of São Paulo, 01246-903 SP, Brazil
- Department of Internal Medicine, Medical School and Hospital das Clínicas, University of São Paulo, 01246-903 SP, Brazil
| |
Collapse
|
3
|
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2023; 2023:5333361. [PMID: 36644165 PMCID: PMC9833906 DOI: 10.1155/2023/5333361] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/15/2022] [Accepted: 12/15/2022] [Indexed: 01/06/2023]
Abstract
Long-term cigarette smoking causes various human diseases, including respiratory disease, cancer, and gastrointestinal (GI) disorders. Alterations in gene expression and variable splicing processes induced by smoking are associated with the development of diseases. This study applied advanced machine learning methods to identify the isoforms with important roles in distinguishing smokers from former smokers based on the expression profile of isoforms from current and former smokers collected in one previous study. These isoforms were deemed as features, which were first analyzed by the Boruta to select features highly correlated with the target variables. Then, the selected features were evaluated by four feature ranking algorithms, resulting in four feature lists. The incremental feature selection method was applied to each list for obtaining the optimal feature subsets and building high-performance classification models. Furthermore, a series of classification rules were accessed by decision tree with the highest performance. Eventually, the rationality of the mined isoforms (features) and classification rules was verified by reviewing previous research. Features such as isoforms ENST00000464835 (expressed by LRRN3), ENST00000622663 (expressed by SASH1), and ENST00000284311 (expressed by GPR15), and pathways (cytotoxicity mediated by natural killer cell and cytokine-cytokine receptor interaction) revealed by the enrichment analysis, were highly relevant to smoking response, suggesting the robustness of our analysis pipeline.
Collapse
|
4
|
Jeyananthan P. SARS-CoV-2 Diagnosis Using Transcriptome Data: A Machine Learning Approach. SN COMPUTER SCIENCE 2023; 4:218. [PMID: 36844504 PMCID: PMC9936926 DOI: 10.1007/s42979-023-01703-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 01/24/2023] [Indexed: 05/02/2023]
Abstract
SARS-CoV-2 pandemic is the big issue of the whole world right now. The health community is struggling to rescue the public and countries from this spread, which revives time to time with different waves. Even the vaccination seems to be not prevents this spread. Accurate identification of infected people on time is essential these days to control the spread. So far, Polymerase chain reaction (PCR) and rapid antigen tests are widely used in this identification, accepting their own drawbacks. False negative cases are the menaces in this scenario. To avoid these problems, this study uses machine learning techniques to build a classification model with higher accuracy to filter the COVID-19 cases from the non-COVID individuals. Transcriptome data of the SARS-CoV-2 patients along with the control are used in this stratification using three different feature selection algorithms and seven classification models. Differently expressed genes also studied between these two groups of people and used in this classification. Results shows that mutual information (or DEGs) along with naïve Bayes (or SVM) gives the best accuracy (0.98 ± 0.04) among these methods. Supplementary Information The online version contains supplementary material available at 10.1007/s42979-023-01703-6.
Collapse
|
5
|
Das B. An implementation of a hybrid method based on machine learning to identify biomarkers in the Covid-19 diagnosis using DNA sequences. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS : AN INTERNATIONAL JOURNAL SPONSORED BY THE CHEMOMETRICS SOCIETY 2022; 230:104680. [PMID: 36213553 PMCID: PMC9528020 DOI: 10.1016/j.chemolab.2022.104680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
Although some people do not have any chronic disease or are not in the risky age group for Covid-19, they are more vulnerable to the coronavirus. As the reason for this situation, some experts focus on the immune system of the person, while others think that the genetic history of patients may play a role. It is critical to detect corona from DNA signals as early as possible to determine the relationship between Covid-19 and genes. Thus, the effect on the severe course of the disease of variations in the genes associated with the corona disease will be revealed. In this study, a novel intelligent computer approach is proposed to identify coronavirus from nucleotide signals for the first time. The proposed method presents a multilayered feature extraction structure to extract the most effective features using an Entropy-based mapping technique, Discrete Wavelet Transform (DWT), statistical feature extractor, and Singular Value Decomposition (SVD), together. Then 94 distinctive features are selected by the ReliefF technique. Support vector machine (SVM) and k nearest neighborhood (k-NN) are chosen as classifiers. The method achieved the highest classification accuracy rate of 98.84% with an SVM classifier to detect Covid-19 from DNA signals. The proposed method is ready to be tested with a different database in the diagnosis of Covid-19 using RNA or other signals.
Collapse
Affiliation(s)
- Bihter Das
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| |
Collapse
|
6
|
Li H, Wang D, Zhou X, Ding S, Guo W, Zhang S, Li Z, Huang T, Cai YD. Characterization of spleen and lymph node cell types via CITE-seq and machine learning methods. Front Mol Neurosci 2022; 15:1033159. [PMID: 36311013 PMCID: PMC9608858 DOI: 10.3389/fnmol.2022.1033159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 09/26/2022] [Indexed: 11/13/2022] Open
Abstract
The spleen and lymph nodes are important functional organs for human immune system. The identification of cell types for spleen and lymph nodes is helpful for understanding the mechanism of immune system. However, the cell types of spleen and lymph are highly diverse in the human body. Therefore, in this study, we employed a series of machine learning algorithms to computationally analyze the cell types of spleen and lymph based on single-cell CITE-seq sequencing data. A total of 28,211 cell data (training vs. test = 14,435 vs. 13,776) involving 24 cell types were collected for this study. For the training dataset, it was analyzed by Boruta and minimum redundancy maximum relevance (mRMR) one by one, resulting in an mRMR feature list. This list was fed into the incremental feature selection (IFS) method, incorporating four classification algorithms (deep forest, random forest, K-nearest neighbor, and decision tree). Some essential features were discovered and the deep forest with its optimal features achieved the best performance. A group of related proteins (CD4, TCRb, CD103, CD43, and CD23) and genes (Nkg7 and Thy1) contributing to the classification of spleen and lymph nodes cell types were analyzed. Furthermore, the classification rules yielded by decision tree were also provided and analyzed. Above findings may provide helpful information for deepening our understanding on the diversity of cell types.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Deling Wang
- State Key Laboratory of Oncology in South China, Department of Radiology, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Institutes for Biological Sciences (SIBS), Shanghai Jiao Tong University School of Medicine (SJTUSM), Chinese Academy of Sciences (CAS), Shanghai, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
7
|
Jian F, Huang F, Zhang YH, Huang T, Cai YD. Identifying anal and cervical tumorigenesis-associated methylation signaling with machine learning methods. Front Oncol 2022; 12:998032. [PMID: 36249027 PMCID: PMC9557006 DOI: 10.3389/fonc.2022.998032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
Cervical and anal carcinoma are neoplastic diseases with various intraepithelial neoplasia stages. The underlying mechanisms for cancer initiation and progression have not been fully revealed. DNA methylation has been shown to be aberrantly regulated during tumorigenesis in anal and cervical carcinoma, revealing the important roles of DNA methylation signaling as a biomarker to distinguish cancer stages in clinics. In this research, several machine learning methods were used to analyze the methylation profiles on anal and cervical carcinoma samples, which were divided into three classes representing various stages of tumor progression. Advanced feature selection methods, including Boruta, LASSO, LightGBM, and MCFS, were used to select methylation features that are highly correlated with cancer progression. Some methylation probes including cg01550828 and its corresponding gene RNF168 have been reported to be associated with human papilloma virus-related anal cancer. As for biomarkers for cervical carcinoma, cg27012396 and its functional gene HDAC4 were confirmed to regulate the glycolysis and survival of hypoxic tumor cells in cervical carcinoma. Furthermore, we developed effective classifiers for identifying various tumor stages and derived classification rules that reflect the quantitative impact of methylation on tumorigenesis. The current study identified methylation signals associated with the development of cervical and anal carcinoma at qualitative and quantitative levels using advanced machine learning methods.
Collapse
Affiliation(s)
- Fangfang Jian
- Department of Obstetrics & Gynecology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
8
|
Network-Based Data Analysis Reveals Ion Channel-Related Gene Features in COVID-19: A Bioinformatic Approach. Biochem Genet 2022; 61:471-505. [PMID: 36104591 PMCID: PMC9473477 DOI: 10.1007/s10528-022-10280-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 09/01/2022] [Indexed: 11/02/2022]
Abstract
Coronavirus disease 2019 (COVID-19) seriously threatens human health and has been disseminated worldwide. Although there are several treatments for COVID-19, its control is currently suboptimal. Therefore, the development of novel strategies to treat COVID-19 is necessary. Ion channels are located on the membranes of all excitable cells and many intracellular organelles and are key components involved in various biological processes. They are a target of interest when searching for drug targets. This study aimed to reveal the relevant molecular features of ion channel genes in COVID-19 based on bioinformatic analyses. The RNA-sequencing data of patients with COVID-19 and healthy subjects (GSE152418 and GSE171110 datasets) were obtained from the Gene Expression Omnibus (GEO) database. Ion channel genes were selected from the Hugo Gene Nomenclature Committee (HGNC) database. The RStudio software was used to process the data based on the corresponding R language package to identify ion channel-associated differentially expressed genes (DEGs). Based on the DEGs, Gene Ontology (GO) functional and pathway enrichment analyses were performed using the Enrichr web tool. The STRING database was used to generate a protein-protein interaction (PPI) network, and the Cytoscape software was used to screen for hub genes in the PPI network based on the cytoHubba plug-in. Transcription factors (TF)-DEG, DEG-microRNA (miRNA) and DEG-disease association networks were constructed using the NetworkAnalyst web tool. Finally, the screened hub genes as drug targets were subjected to enrichment analysis based on the DSigDB using the Enrichr web tool to identify potential therapeutic agents for COVID-19. A total of 29 ion channel-associated DEGs were identified. GO functional analysis showed that the DEGs were integral components of the plasma membrane and were mainly involved in inorganic cation transmembrane transport and ion channel activity functions. Pathway analysis showed that the DEGs were mainly involved in nicotine addiction, calcium regulation in the cardiac cell and neuronal system pathways. The top 10 hub genes screened based on the PPI network included KCNA2, KCNJ4, CACNA1A, CACNA1E, NALCN, KCNA5, CACNA2D1, TRPC1, TRPM3 and KCNN3. The TF-DEG and DEG-miRNA networks revealed significant TFs (FOXC1, GATA2, HINFP, USF2, JUN and NFKB1) and miRNAs (hsa-mir-146a-5p, hsa-mir-27a-3p, hsa-mir-335-5p, hsa-let-7b-5p and hsa-mir-129-2-3p). Gene-disease association network analysis revealed that the DEGs were closely associated with intellectual disability and cerebellar ataxia. Drug-target enrichment analysis showed that the relevant drugs targeting the hub genes CACNA2D1, CACNA1A, CACNA1E, KCNA2 and KCNA5 were gabapentin, gabapentin enacarbil, pregabalin, guanidine hydrochloride and 4-aminopyridine. The results of this study provide a valuable basis for exploring the mechanisms of ion channel genes in COVID-19 and clues for developing therapeutic strategies for COVID-19.
Collapse
|
9
|
Li H, Huang F, Liao H, Li Z, Feng K, Huang T, Cai YD. Identification of COVID-19-Specific Immune Markers Using a Machine Learning Method. Front Mol Biosci 2022; 9:952626. [PMID: 35928229 PMCID: PMC9344575 DOI: 10.3389/fmolb.2022.952626] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 06/21/2022] [Indexed: 01/08/2023] Open
Abstract
Notably, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a tight relationship with the immune system. Human resistance to COVID-19 infection comprises two stages. The first stage is immune defense, while the second stage is extensive inflammation. This process is further divided into innate and adaptive immunity during the immune defense phase. These two stages involve various immune cells, including CD4+ T cells, CD8+ T cells, monocytes, dendritic cells, B cells, and natural killer cells. Various immune cells are involved and make up the complex and unique immune system response to COVID-19, providing characteristics that set it apart from other respiratory infectious diseases. In the present study, we identified cell markers for differentiating COVID-19 from common inflammatory responses, non-COVID-19 severe respiratory diseases, and healthy populations based on single-cell profiling of the gene expression of six immune cell types by using Boruta and mRMR feature selection methods. Some features such as IFI44L in B cells, S100A8 in monocytes, and NCR2 in natural killer cells are involved in the innate immune response of COVID-19. Other features such as ZFP36L2 in CD4+ T cells can regulate the inflammatory process of COVID-19. Subsequently, the IFS method was used to determine the best feature subsets and classifiers in the six immune cell types for two classification algorithms. Furthermore, we established the quantitative rules used to distinguish the disease status. The results of this study can provide theoretical support for a more in-depth investigation of COVID-19 pathogenesis and intervention strategies.
Collapse
Affiliation(s)
- Hao Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Huiping Liao
- Ophthalmology and Optometry Medical School, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
10
|
Sun Y, Zhang Q, Yang Q, Yao M, Xu F, Chen W. Screening of Gene Expression Markers for Corona Virus Disease 2019 Through Boruta_MCFS Feature Selection. Front Public Health 2022; 10:901602. [PMID: 35812497 PMCID: PMC9258782 DOI: 10.3389/fpubh.2022.901602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/17/2022] [Indexed: 11/25/2022] Open
Abstract
Since the first report of SARS-CoV-2 virus in Wuhan, China in December 2019, a global outbreak of Corona Virus Disease 2019 (COVID-19) pandemic has been aroused. In the prevention of this disease, accurate diagnosis of COVID-19 is the center of the problem. However, due to the limitation of detection technology, the test results are impossible to be totally free from pseudo-positive or -negative. Improving the precision of the test results asks for the identification of more biomarkers for COVID-19. On the basis of the expression data of COVID-19 positive and negative samples, we first screened the feature genes through ReliefF, minimal-redundancy-maximum-relevancy, and Boruta_MCFS methods. Thereafter, 36 optimal feature genes were selected through incremental feature selection method based on the random forest classifier, and the enriched biological functions and signaling pathways were revealed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Also, protein-protein interaction network analysis was performed on these feature genes, and the enriched biological functions and signaling pathways of main submodules were analyzed. In addition, whether these 36 feature genes could effectively distinguish positive samples from the negative ones was verified by dimensionality reduction analysis. According to the results, we inferred that the 36 feature genes selected via Boruta_MCFS could be deemed as biomarkers in COVID-19.
Collapse
Affiliation(s)
- Yanbao Sun
- Department of Radiology, Affiliated Hospital of Jiaxing University, Jiaxing, China
| | - Qi Zhang
- Department of Respiration in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
| | - Qi Yang
- Department of Respiration in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
| | - Ming Yao
- Center for Pain Medicine in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
| | - Fang Xu
- The Xiuzhou Kang'an Hospital of Jiaxing, Jiaxing, China
| | - Wenyu Chen
- Department of Respiration in Affiliated Hospital of Jiaxing University/The First Hospital of Jiaxing, Jiaxing, China
- *Correspondence: Wenyu Chen
| |
Collapse
|
11
|
Li Z, Huang F, Chen L, Huang T, Cai YD. Identifying In Vitro Cultured Human Hepatocytes Markers with Machine Learning Methods Based on Single-Cell RNA-Seq Data. Front Bioeng Biotechnol 2022; 10:916309. [PMID: 35706505 PMCID: PMC9189284 DOI: 10.3389/fbioe.2022.916309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 05/11/2022] [Indexed: 01/12/2023] Open
Abstract
Cell transplantation is an effective method for compensating for the loss of liver function and improve patient survival. However, given that hepatocytes cultivated in vitro have diverse developmental processes and physiological features, obtaining hepatocytes that can properly function in vivo is difficult. In the present study, we present an advanced computational analysis on single-cell transcriptional profiling to resolve the heterogeneity of the hepatocyte differentiation process in vitro and to mine biomarkers at different periods of differentiation. We obtained a batch of compressed and effective classification features with the Boruta method and ranked them using the Max-Relevance and Min-Redundancy method. Some key genes were identified during the in vitro culture of hepatocytes, including CD147, which not only regulates terminally differentiated cells in the liver but also affects cell differentiation. PPIA, which encodes a CD147 ligand, also appeared in the identified gene list, and the combination of the two proteins mediated multiple biological pathways. Other genes, such as TMSB10, TMEM176B, and CD63, which are involved in the maturation and differentiation of hepatocytes and assist different hepatic cell types in performing their roles were also identified. Then, several classifiers were trained and evaluated to obtain optimal classifiers and optimal feature subsets, using three classification algorithms (random forest, k-nearest neighbor, and decision tree) and the incremental feature selection method. The best random forest classifier with a 0.940 Matthews correlation coefficient was constructed to distinguish different hepatic cell types. Finally, classification rules were created for quantitatively describing hepatic cell types. In summary, This study provided potential targets for cell transplantation associated liver disease treatment strategies by elucidating the process and mechanism of hepatocyte development at both qualitative and quantitative levels.
Collapse
Affiliation(s)
- ZhanDong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
12
|
Chen L, Mei Z, Guo W, Ding S, Huang T, Cai YD. Recognition of Immune Cell Markers of COVID-19 Severity with Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2022; 2022:6089242. [PMID: 35528178 PMCID: PMC9073549 DOI: 10.1155/2022/6089242] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 04/11/2022] [Indexed: 01/08/2023]
Abstract
COVID-19 is hypothesized to be linked to the host's excessive inflammatory immunological response to SARS-CoV-2 infection, which is regarded to be a major factor in disease severity and mortality. Numerous immune cells play a key role in immune response regulation, and gene expression analysis in these cells could be a useful method for studying disease states, assessing immunological responses, and detecting biomarkers. Here, we developed a machine learning procedure to find biomarkers that discriminate disease severity in individual immune cells (B cell, CD4+ cell, CD8+ cell, monocyte, and NK cell) using single-cell gene expression profiles of COVID-19. The gene features of each profile were first filtered and ranked using the Boruta feature selection method and mRMR, and the resulting ranked feature lists were then fed into the incremental feature selection method to determine the optimal number of features with decision tree and random forest algorithms. Meanwhile, we extracted the classification rules in each cell type from the optimal decision tree classifiers. The best gene sets discovered in this study were analyzed by GO and KEGG pathway enrichment, and some important biomarkers like TLR2, ITK, CX3CR1, IL1B, and PRDM1 were validated by recent literature. The findings reveal that the optimal gene sets for each cell type can accurately classify COVID-19 disease severity and provide insight into the molecular mechanisms involved in disease progression.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Zi Mei
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200031, China
| | - ShiJian Ding
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|