1
|
Shi H, Cao X. Potential Targets Related to Skin Aging: Based on eQTL and GWAS Datasets. Clin Cosmet Investig Dermatol 2025; 18:677-686. [PMID: 40144806 PMCID: PMC11937646 DOI: 10.2147/ccid.s508946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Accepted: 03/05/2025] [Indexed: 03/28/2025]
Abstract
Background The aging of skin has important impact on various systems, and certain skin aging (SG) markers can not only help with early diagnosis, but also provide new ideas for pathophysiological research and treatment strategies. Objective To identify target genes related to SG through bioinformatics technology and provide ideas for skin anti-aging. Methods Differential expression genes (DEGs) related to SG were screened through transcriptome information from GEO datasets (GSE85358 and GSE670988). Based on eQTL and GWAS datasets, Mendelian Randomization (MR) analysis was applied to identify associations between gene expression and SG. Then, aging skin related important genes (AS-IGs) were obtained based on above two steps, and functional and pathway analyses were performed to explore the potential mechanisms AS-IGs in SG. Finally, the CIBERSORT evaluation was used to assess the infiltration of immune cells related to SG. Results Seven AS-IGs were selected through intersection from 612 DEGs and 399 eQTL genes. Then, enrichment analysis results showed there were 60 GO terms may involved in the process of SG, like fatty-acyl-CoA metabolic process, while KEGG enrichment pathways identified mainly involved in mechanisms related to fatty acid metabolism, energy generation, and inflammation regulation. The CIBERSORT evaluation showed that NK cells resting were the main infiltrating cells. Conclusion AS-IGs may play important roles in the process of SG in the body. These molecules involve multiple systems and mechanisms in the body, such as immune function, metabolic function, and neuroendocrine function.
Collapse
Affiliation(s)
- Hanping Shi
- School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi, 330006, People’s Republic of China
- Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang, Jiangxi, 330006, People’s Republic of China
| | - Xianwei Cao
- School of Public Health, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi, 330006, People’s Republic of China
- Jiangxi Provincial Key Laboratory of Disease Prevention and Public Health, Nanchang University, Nanchang, Jiangxi, 330006, People’s Republic of China
- Department of Dermatology, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, People’s Republic of China
| |
Collapse
|
2
|
Zhang W, Shi H, Peng J. A diagnostic model for sepsis using an integrated machine learning framework approach and its therapeutic drug discovery. BMC Infect Dis 2025; 25:219. [PMID: 39953444 PMCID: PMC11827343 DOI: 10.1186/s12879-025-10616-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 02/07/2025] [Indexed: 02/17/2025] Open
Abstract
BACKGROUND Sepsis remains a life-threatening condition in intensive care units (ICU) with high morbidity and mortality rates. Some biomarkers commonly used in clinic do not have the characteristics of rapid and specific growth and rapid decline after effective treatment. Machine learning has shown great potential in early diagnosis, subtype analysis, accurate treatment and prognosis evaluation of sepsis. METHODS Gene expression matrices from GSE13904 and GSE26440 were combined into a training model after quality control and standardization. Then, the intersection genes were obtained by crossing the screened differentially expressed genes (DEGs) and the module genes with the strongest correlation obtained by WGCNA analysis. 113 combined machine learning algorithms to build a diagnosis model. Then the CIBERSORT algorithm is used to analyze the relationship between the change of core gene expression and immune response in sepsis. Construct nomogram, DCA and CIC to further verify the reliability of the diagnosis model. The potential molecular compounds interacting with key genes were searched from the Traditional Chinese Medicine Active Compound Library (TCMACL). RESULTS We screened 405 DEGs, including 334 up-regulated and 71 down-regulated genes. The 308 potential genes were obtained by intersection of MEturquoise module genes in WGCNA analysis and DEGs for subsequent machine learning analysis. GO and KEGG enrichment analysis showed that sepsis was mainly related to immune response and bacterial infection. Then 113 combined machine learning algorithms are applied to construct a diagnosis model to screen 22 hub genes. Four four key genes (CD177, GNLY, ANKRD22, and IFIT1) are obtained through further analysis of PPI network constructed by 22 hub genes. Subsequently, the diagnostic model is proved to have good predictive value by nomogram, DCA and CIC. Finally, molecular compounds (Dieckol, Grosvenorine and Tellimagrandin II) were screened out as potential drugs. CONCLUSION 113 combinated machine learning algorithms screened out four key genes that can distinguish sepsis patients. At the same time, potential therapeutic molecular compounds interacting with key genes genes were screened out by molecular docking.
Collapse
Affiliation(s)
- Wuping Zhang
- Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, No.152 Aiguo Road, Nanchang, Jiangxi Province, 330006, China
| | - Hanping Shi
- Jiangxi Provincial Key Laboratory of Preventive Medicine, School of Public Health, Nanchang University, Nanchang, Jiangxi, 330031, China
| | - Jie Peng
- Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, No.152 Aiguo Road, Nanchang, Jiangxi Province, 330006, China.
| |
Collapse
|
3
|
Wu J, Wang L, Cui Y, Liu C, Ding W, Ren S, Dong R, Zhang J. Development of a Quality Evaluation Method for Allii Macrostemonis Bulbus Based on Solid-Phase Extraction-High-Performance Liquid Chromatography-Evaporative Light Scattering Detection Chromatographic Fingerprinting, Chemometrics, and Quantitative Analysis of Multi-Components via a Single-Marker Method. Molecules 2024; 29:4600. [PMID: 39407530 PMCID: PMC11478197 DOI: 10.3390/molecules29194600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 08/31/2024] [Accepted: 09/25/2024] [Indexed: 10/20/2024] Open
Abstract
As a traditional Chinese medicine (TCM), Allii Macrostemonis Bulbus (AMB) is a key herb for the treatment of thoracic paralytic cardiac pain, but its quality evaluation method has not yet been fully clarified. In this study, chromatographic fingerprints of AMB were developed using solid-phase extraction-high-performance liquid chromatography-evaporative light scattering detection (SPE-HPLC-ELSD) to evaluate the quality of AMB from various origins and processing methods. This was achieved by employing chemical pattern recognition techniques and verifying the feasibility and applicability of the quality evaluation of AMB through the quantitative analysis of multi-components via a single-marker (QAMS) method. Through the analysis of the fingerprints of 18 batches of AMB, 30 common peaks were screened, and 6 components (adenosine, syringin, macrostemonoside T, macrostemonoside A, macrostemonoside U, and macrostemonoside V) were identified. Moreover, three differential markers (macrostemonoside A, macrostemonoside T, and macrostemonoside U) were screened out using chemometrics techniques, including principal component analysis (PCA), hierarchical cluster analysis (HCA), and orthogonal partial least squares discriminant analysis (OPLS-DA). Subsequently, a QAMS method was established for macrostemonoside T and macrostemonoside U using macrostemonoside A as an internal reference. The results demonstrate the method's accuracy, reproducibility, and stability, rendering it suitable for the quality evaluation of AMB. This study provides a theoretical basis for drug quality control and the discovery of quality markers for AMB.
Collapse
Affiliation(s)
- Jianfa Wu
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
| | - Lulu Wang
- School of Medicine, Changchun Sci-Tech University, Changchun 130600, China;
| | - Ying Cui
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
| | - Chang Liu
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
| | - Weixing Ding
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
| | - Shen Ren
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
- Jilin Provincial International Joint Research Center for the Development and Utilization of Authentic Medicinal Materials, Changchun 130600, China
| | - Rui Dong
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
- Jilin Provincial International Joint Research Center for the Development and Utilization of Authentic Medicinal Materials, Changchun 130600, China
| | - Jing Zhang
- College of Chinese Medicinal Materials, Jilin Agricultural University, Changchun 130118, China; (J.W.); (Y.C.); (C.L.); (W.D.); (S.R.)
- Jilin Provincial International Joint Research Center for the Development and Utilization of Authentic Medicinal Materials, Changchun 130600, China
| |
Collapse
|
4
|
Wang J, Wang TG, Yuan S, Li F. Accurate identification of single-cell types via correntropy-based Sparse PCA combining hypergraph and fusion similarity. J Appl Stat 2024; 52:356-380. [PMID: 39926175 PMCID: PMC11800351 DOI: 10.1080/02664763.2024.2369955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 06/11/2024] [Indexed: 02/11/2025]
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) technology enables researchers to gain deep insights into cellular heterogeneity. However, the high dimensionality and noise of scRNA-seq data pose significant challenges to clustering. Therefore, we propose a new single-cell type identification method, called CHLSPCA, to address these challenges. In this model, we innovatively combine correntropy with PCA to address the noise and outliers inherent in scRNA-seq data. Meanwhile, we integrate the hypergraph into the model to extract more valuable information from the local structure of the original data. Subsequently, to capture crucial similarity information not considered by the PCA model, we employ the Gaussian kernel function and the Euclidean metric to mine the similarity information between cells, and incorporate this information into the model as the similarity constraint. Furthermore, the principal components (PCs) of PCA are very dense. A new sparse constraint is introduced into the model to gain sparse PCs. Finally, based on the principal direction matrix learned from CHLSPCA, we conduct extensive downstream analyses on real scRNA-seq datasets. The experimental results show that CHLSPCA performs better than many popular clustering methods and is expected to promote the understanding of cellular heterogeneity in scRNA-seq data analysis and support biomedical research.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao, People’s Republic of China
| | - Tai-Ge Wang
- School of Computer Science, Qufu Normal University, Rizhao, People’s Republic of China
| | - Shasha Yuan
- School of Computer Science, Qufu Normal University, Rizhao, People’s Republic of China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, People’s Republic of China
| |
Collapse
|
5
|
Liu X, Chen Z, Wang X, Luo W, Yang F. Quality Assessment and Classification of Codonopsis Radix Based on Fingerprints and Chemometrics. Molecules 2023; 28:5127. [PMID: 37446787 DOI: 10.3390/molecules28135127] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/27/2023] [Accepted: 06/27/2023] [Indexed: 07/15/2023] Open
Abstract
In China, Codonopsis Radix (CR) is frequently consumed both as food and medicine. Here, a comprehensive strategy based on fingerprinting and chemometric approaches was created to explore the influence of origins, storage time and kneading processing on the quality of CR. Firstly, high-performance liquid chromatography with diode array detection was used to obtain the fingerprints of 35 batches of CR from six different origins and 33 batches of CR from varying storage times or kneading procedures. Secondly, chemometric methods including similarity analysis (SA), principal component analysis (PCA), hierarchical clustering analysis (HCA), and two-way orthogonal partial least square with discriminant analysis (O2PLS-DA) were used to evaluate the differences of chemical components in CR so as to identify its source and reflect its quality. Moreover, 13 and 16 major compounds were identified as marker compounds for the discrimination of CR from different origins, storage time and kneading processing, respectively. Furthermore, the relative content of the marker components and the exact content of Lobetyolin were measured, indicating that the contents of these components vary significantly between various CR samples. Meanwhile, the chemical components of CR were identified using Mass spectrometry. According to the findings of our investigation, the quality of CR from Gansu was the best, followed by Shanxi and then Sichuan. The quality of CR from Chongqing and Guizhou was poor. At the same time, the quality of CR was the best when it was kneaded and stored for 0 years, indicating that the traditional kneading process of CR is of great significance. Conclusively, HPLC fingerprint in conjunction with chemical pattern recognition and component content determination can be employed to differentiate the raw materials of different CR samples. Additionally, it is also a reliable, comprehensive and prospective method for quality control and evaluation of CR.
Collapse
Affiliation(s)
- Xuxia Liu
- School of Pharmacy, Gansu University of Traditional Chinese Medicine, Lanzhou 730013, China
| | - Zhengjun Chen
- School of Pharmacy, Gansu University of Traditional Chinese Medicine, Lanzhou 730013, China
| | - Xin Wang
- School of Pharmacy, Gansu University of Traditional Chinese Medicine, Lanzhou 730013, China
| | - Wenrong Luo
- Gansu Provincial Hospital of Chinese Medicine, Lanzhou 730050, China
| | - Fude Yang
- School of Pharmacy, Gansu University of Traditional Chinese Medicine, Lanzhou 730013, China
| |
Collapse
|
6
|
Li J, Li L, You P, Wei Y, Xu B. Towards artificial intelligence to multi-omics characterization of tumor heterogeneity in esophageal cancer. Semin Cancer Biol 2023; 91:35-49. [PMID: 36868394 DOI: 10.1016/j.semcancer.2023.02.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/21/2023] [Accepted: 02/28/2023] [Indexed: 03/05/2023]
Abstract
Esophageal cancer is a unique and complex heterogeneous malignancy, with substantial tumor heterogeneity: at the cellular levels, tumors are composed of tumor and stromal cellular components; at the genetic levels, they comprise genetically distinct tumor clones; at the phenotypic levels, cells in distinct microenvironmental niches acquire diverse phenotypic features. This heterogeneity affects almost every process of esophageal cancer progression from onset to metastases and recurrence, etc. Intertumoral and intratumoral heterogeneity are major obstacles in the treatment of esophageal cancer, but also offer the potential to manipulate the heterogeneity themselves as a new therapeutic strategy. The high-dimensional, multi-faceted characterization of genomics, epigenomics, transcriptomics, proteomics, metabonomics, etc. of esophageal cancer has opened novel horizons for dissecting tumor heterogeneity. Artificial intelligence especially machine learning and deep learning algorithms, are able to make decisive interpretations of data from multi-omics layers. To date, artificial intelligence has emerged as a promising computational tool for analyzing and dissecting esophageal patient-specific multi-omics data. This review provides a comprehensive review of tumor heterogeneity from a multi-omics perspective. Especially, we discuss the novel techniques single-cell sequencing and spatial transcriptomics, which have revolutionized our understanding of the cell compositions of esophageal cancer and allowed us to determine novel cell types. We focus on the latest advances in artificial intelligence in integrating multi-omics data of esophageal cancer. Artificial intelligence-based multi-omics data integration computational tools exert a key role in tumor heterogeneity assessment, which will potentially boost the development of precision oncology in esophageal cancer.
Collapse
Affiliation(s)
- Junyu Li
- Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China; Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Lin Li
- Department of Thoracic Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Peimeng You
- Nanchang University, Department of Radiation Oncology, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China
| | - Yiping Wei
- Department of Thoracic Surgery, The Second Affiliated Hospital of Nanchang University, Nanchang 330006, Jiangxi, China.
| | - Bin Xu
- Jiangxi Health Committee Key (JHCK) Laboratory of Tumor Metastasis, Jiangxi Cancer Hospital, Nanchang 330029, Jiangxi, China.
| |
Collapse
|
7
|
He G, Wang H, Liu S, Zhang B. CSMVC: A Multiview Method for Multivariate Time-Series Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:13425-13437. [PMID: 34469322 DOI: 10.1109/tcyb.2021.3083592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multivariate time-series (MTS) clustering is a fundamental technique in data mining with a wide range of real-world applications. To date, though some approaches have been developed, they suffer from various drawbacks, such as high computational cost or loss of information. Most existing approaches are single-view methods without considering the benefits of mutual-support multiple views. Moreover, due to its data structure, MTS data cannot be handled well by most multiview clustering methods. Toward this end, we propose a consistent and specific non-negative matrix factorization-based multiview clustering (CSMVC) method for MTS clustering. The proposed method constructs a multilayer graph to represent the original MTS data and generates multiple views with a subspace technique. The obtained multiview data are processed through a novel non-negative matrix factorization (NMF) method, which can explore the view-consistent and view-specific information simultaneously. Furthermore, an alternating optimization scheme is proposed to solve the corresponding optimization problem. We conduct extensive experiments on 13 benchmark datasets and the results demonstrate the superiority of our proposed method against other state-of-the-art algorithms under a wide range of evaluation metrics.
Collapse
|
8
|
Wang Y, Guan T, Zhou G, Zhao H, Gao J. SOJNMF: Identifying Multidimensional Molecular Regulatory Modules by Sparse Orthogonality-Regularized Joint Non-Negative Matrix Factorization Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3695-3703. [PMID: 34546925 DOI: 10.1109/tcbb.2021.3114146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cancer is not only a very aggressive but also a very diverse disease. Recent advances in high-throughput omics technologies of cancer have enabled biomedical researchers to have more opportunities for studying its multi-level biological regulatory mechanism. However, there are few methods to explore the underlying mechanism of cancer by identifying its multidimensional molecular regulatory modules from the multidimensional omics data of cancer. In this paper, we propose a sparse orthogonality-regularized joint non-negative matrix factorization (SOJNMF) algorithm which can integratively analyze multidimensional omics data. This method can not only identify multidimensional molecular regulatory modules, but reduce the overlap rate of features among the multidimensional modules while ensuring the sparsity of the coefficient matrix after decomposition. Gene expression data, miRNA expression data and gene methylation data of liver cancer are integratively analyzed based on SOJNMF algorithm. Then, we obtain 238 multidimensional molecular regulatory modules. The results of permutation test indicate that different omics features within these modules are significantly correlated in statistics. Meanwhile, the results of functional enrichment analysis show that these multidimensional modules are significantly related to the underlying mechanism of the occurrence and development of liver cancer.
Collapse
|
9
|
Shetta O, Niranjan M, Dasmahapatra S. Convex Multi-View Clustering Via Robust Low Rank Approximation With Application to Multi-Omic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3340-3352. [PMID: 34705655 DOI: 10.1109/tcbb.2021.3122961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent advances in high throughput technologies have made large amounts of biomedical omics data accessible to the scientific community. Single omic data clustering has proved its impact in the biomedical and biological research fields. Multi-omic data clustering and multi-omic data integration techniques have shown improved clustering performance and biological insight. Cancer subtype clustering is an important task in the medical field to be able to identify a suitable treatment procedure and prognosis for cancer patients. State of the art multi-view clustering methods are based on non-convex objectives which only guarantee non-global solutions that are high in computational complexity. Only a few convex multi-view methods are present. However, their models do not take into account the intrinsic manifold structure of the data. In this paper, we introduce a convex graph regularized multi-view clustering method that is robust to outliers. We compare our algorithm to state of the art convex and non-convex multi-view and single view clustering methods, and show its superiority in clustering cancer subtypes on publicly available cancer genomic datasets from the TCGA repository. We also show our method's better ability to potentially discover cancer subtypes compared to other state of the art multi-view methods.
Collapse
|
10
|
Arya N, Saha S. Generative Incomplete Multi-View Prognosis Predictor for Breast Cancer: GIMPP. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2252-2263. [PMID: 34143737 DOI: 10.1109/tcbb.2021.3090458] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In today's digital world, we are equipped with modern computer-based data collection sources and feature extraction methods. It enhances the availability of the multi-view data and corresponding researches. Multi-view prediction models form a mainstream research direction in the healthcare and bioinformatics domain. While these models are designed with the assumption that there is no missing data for any views, in the real world, certain views of the data are often not having the same number of samples, resulting in the incomplete multi-view dataset. The studies performed over these datasets are termed incomplete multi-view clustering or prediction. Here, we develop a two-stage generative incomplete multi-view prediction model named GIMPP to address the missing view problem of breast cancer prognosis prediction by explicitly generating the missing data. The first stage incorporates the multi-view encoder networks and the bi-modal attention scheme to learn common latent space representations by leveraging complementary knowledge between different views. The second stage generates missing view data using view-specific generative adversarial networks conditioned on the shared representations and encoded features given by other views. Experimental results on TCGA-BRCA and METABRIC datasets proves the usefulness of the developed method over the state-of-the-art methods.
Collapse
|
11
|
Zhang LX, Yan H, Liu Y, Xu J, Song J, Yu DJ. Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA. J Chem Inf Model 2022; 62:1794-1807. [PMID: 35353532 DOI: 10.1021/acs.jcim.1c01403] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Characteristic gene selection and tumor classification of gene expression data play major roles in genomic research. Due to the characteristics of a small sample size and high dimensionality of gene expression data, it is a common practice to perform dimensionality reduction prior to the use of machine learning-based methods to analyze the expression data. In this context, classical principal component analysis (PCA) and its improved versions have been widely used. Recently, methods based on supervised discriminative sparse PCA have been developed to improve the performance of data dimensionality reduction. However, such methods still have limitations: most of them have not taken into consideration the improvement of robustness to outliers and noise, label information, sparsity, as well as capturing intrinsic geometrical structures in one objective function. To address this drawback, in this study, we propose a novel PCA-based method, known as the robust Laplacian supervised discriminative sparse PCA, termed RLSDSPCA, which enforces the L2,1 norm on the error function and incorporates the graph Laplacian into supervised discriminative sparse PCA. To evaluate the efficacy of the proposed RLSDSPCA, we applied it to the problems of characteristic gene selection and tumor classification problems using gene expression data. The results demonstrate that the proposed RLSDSPCA method, when used in combination with other related methods, can effectively identify new pathogenic genes associated with diseases. In addition, RLSDSPCA has also achieved the best performance compared with the state-of-the-art methods on tumor classification in terms of major performance metrics. The codes and data sets used in the study are freely available at http://csbio.njust.edu.cn/bioinf/rlsdspca/.
Collapse
Affiliation(s)
- Lu-Xing Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Jian Xu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
12
|
Qiao C, Hu XY, Xiao L, Calhoun VD, Wang YP. A deep autoencoder with sparse and graph Laplacian regularization for characterizing dynamic functional connectivity during brain development. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
13
|
Kosvyra A, Ntzioni E, Chouvarda I. Network analysis with biological data of cancer patients: A scoping review. J Biomed Inform 2021; 120:103873. [PMID: 34298154 DOI: 10.1016/j.jbi.2021.103873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 06/30/2021] [Accepted: 07/18/2021] [Indexed: 12/25/2022]
Abstract
BACKGROUND & OBJECTIVE Network Analysis (NA) is a mathematical method that allows exploring relations between units and representing them as a graph. Although NA was initially related to social sciences, the past two decades was introduced in Bioinformatics. The recent growth of the networks' use in biological data analysis reveals the need to further investigate this area. In this work, we attempt to identify the use of NA with biological data, and specifically: (a) what types of data are used and whether they are integrated or not, (b) what is the purpose of this analysis, predictive or descriptive, and (c) the outcome of such analyses, specifically in cancer diseases. METHODS & MATERIALS The literature review was conducted on two databases, PubMed & IEEE, and was restricted to journal articles of the last decade (January 2010 - December 2019). At a first level, all articles were screened by title and abstract, and at a second level the screening was conducted by reading the full text article, following the predefined inclusion & exclusion criteria leading to 131 articles of interest. A table was created with the information of interest and was used for the classification of the articles. The articles were initially classified to analysis studies and studies that propose a new algorithm or methodology. Each one of these categories was further screened by the following clustering criteria: (a) data used, (b) study purpose, (c) study outcome. Specifically for the studies proposing a new algorithm, the novelty presented in each one was detected. RESULTS & Conclusions: In the past five years researchers are focusing on creating new algorithms and methodologies to enhance this field. The articles' classification revealed that only 25% of the analyses are integrating multi-omics data, although 50% of the new algorithms developed follow this integrative direction. Moreover, only 20% of the analyses and 10% of the newly developed methodologies have a predictive purpose. Regarding the result of the works reviewed, 75% of the studies focus on identifying, prognostic or not, gene signatures. Concluding, this review revealed the need for deploying predictive and multi-omics integrative algorithms and methodologies that can be used to enhance cancer diagnosis, prognosis and treatment.
Collapse
Affiliation(s)
- A Kosvyra
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - E Ntzioni
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - I Chouvarda
- Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
14
|
Liu JX, Cui Z, Gao YL, Kong XZ. WGRCMF: A Weighted Graph Regularized Collaborative Matrix Factorization Method for Predicting Novel LncRNA-Disease Associations. IEEE J Biomed Health Inform 2021; 25:257-265. [PMID: 32287024 DOI: 10.1109/jbhi.2020.2985703] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In recent years, many human diseases have been determined to be associated with certain lncRNAs. Only a small percentage of all lncRNA-disease associations (LDAs) have been discovered by researchers. Predicting novel LDAs is time-consuming and costly. It is crucial to propose a method that can effectively identify potential LDAs to solve this problem based on the available datasets. Although some current methods can effectively predict potential LDAs, the prediction accuracy needs to be improved, and there are few known associations. Moreover, there are notable errors in the method of constructing the network and the bipartite graph, which interfere with the final results. A weighted graph regularized collaborative matrix factorization (WGRCMF) method is proposed to predict novel LDAs. We introduce the graph regularization terms into the collaborative matrix factorization. Considering that manifold learning can recover low-dimensional manifold structures from high-dimensional sampled data, we can find low-dimensional manifolds in high-dimensional space. In addition, a weight matrix is also introduced into the method, the significance of which is to prevent unknown associations from contributing to the final prediction matrix. Finally, the prediction accuracy of this method is better than those of other methods. In several cancer cases, we implemented the corresponding simulation experiments. According to the experimental results, the proposed method is feasible and effective.
Collapse
|