1
|
Moslemi A, Ahmadian A. Dual regularized subspace learning using adaptive graph learning and rank constraint: Unsupervised feature selection on gene expression microarray datasets. Comput Biol Med 2023; 167:107659. [PMID: 37950946 DOI: 10.1016/j.compbiomed.2023.107659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/13/2023] [Accepted: 10/31/2023] [Indexed: 11/13/2023]
Abstract
High-dimensional problems have increasingly drawn attention in gene selection and analysis. To add insult to injury, usually the number of features is greater than number of samples in microarray gene dataset which leads to an ill-posed underdetermined equation system. Poor performance and high computational time for learning algorithms are consequences of redundant features in high-dimensional data. Feature selection is a noteworthy pre-processing method to ameliorate the curse of dimensionality with aim of maximum relevancy and minimum redundancy information preservation. Likewise, unsupervised feature selection has been important since collecting labels for data is expensive. In this paper, we develop a novel robust unsupervised feature selection to select discriminative subset of features for unlabeled data based on rank constrained and dual regularized nonnegative matrix factorization. The major focus of the proposed technique is to discard redundant features while keeping the informative features. Proposed feature selection technique consists of nonnegative matrix factorization to decompose the data into feature weight matrix and representation matrix, inner product norm as regularization for both feature weight matrix and representation matrix, adaptive structure learning to preserve local information and Schatten-p norm as rank constraint. To demonstrate the effectiveness of the proposed method, numerical studies are conducted on six benchmark microarray datasets. The results show that the proposed technique outperforms eight state-of-art unsupervised feature selection techniques in terms of clustering accuracy and normalized mutual information.
Collapse
Affiliation(s)
- Amir Moslemi
- Imaging Research and Physical Sciences, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada.
| | - Arash Ahmadian
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Moslemi A, Bidar M, Ahmadian A. Subspace learning using structure learning and non-convex regularization: Hybrid technique with mushroom reproduction optimization in gene selection. Comput Biol Med 2023; 164:107309. [PMID: 37536092 DOI: 10.1016/j.compbiomed.2023.107309] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 08/05/2023]
Abstract
Gene selection as a problem with high dimensions has drawn considerable attention in machine learning and computational biology over the past decade. In the field of gene selection in cancer datasets, different types of feature selection techniques in terms of strategy (filter, wrapper and embedded) and label information (supervised, unsupervised, and semi-supervised) have been developed. However, using hybrid feature selection can still improve the performance. In this paper, we propose a hybrid feature selection based on filter and wrapper strategies. In the filter-phase, we develop an unsupervised features selection based on non-convex regularized non-negative matrix factorization and structure learning, which we deem NCNMFSL. In the wrapper-phase, for the first time, mushroom reproduction optimization (MRO) is leveraged to obtain the most informative features subset. In this hybrid feature selection method, irrelevant features are filtered-out through NCNMFSL, and most discriminative features are selected by MRO. To show the effectiveness and proficiency of the proposed method, numerical experiments are conducted on Breast, Heart, Colon, Leukemia, Prostate, Tox-171 and GLI-85 benchmark datasets. SVM and decision tree classifiers are leveraged to analyze proposed technique and top accuracy are 0.97, 0.84, 0.98, 0.95, 0.98, 0.87 and 0.85 for Breast, Heart, Colon, Leukemia, Prostate, Tox-171 and GLI-85, respectively. The computational results show the effectiveness of the proposed method in comparison with state-of-art feature selection techniques.
Collapse
Affiliation(s)
- Amir Moslemi
- Department of Physics, Ryerson University, Toronto, ON, Canada.
| | - Mahdi Bidar
- Department of Computer Science, University of Regina, Regina, Canada
| | - Arash Ahmadian
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Chen M, Gong M, Li X. Feature Weighted Non-Negative Matrix Factorization. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1093-1105. [PMID: 34437084 DOI: 10.1109/tcyb.2021.3100067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Non-negative matrix factorization (NMF) is one of the most popular techniques for data representation and clustering and has been widely used in machine learning and data analysis. NMF concentrates the features of each sample into a vector and approximates it by the linear combination of basis vectors, such that the low-dimensional representations are achieved. However, in real-world applications, the features usually have different importance. To exploit the discriminative features, some methods project the samples into the subspace with a transformation matrix, which disturbs the original feature attributes and neglects the diversity of samples. To alleviate the above problems, we propose the feature weighted NMF (FNMF) in this article. The salient properties of FNMF can be summarized as three-fold: 1) it learns the weights of features adaptively according to their importance; 2) it utilizes multiple feature weighting components to preserve the diversity; and 3) it can be solved efficiently with the suggested optimization algorithm. The performance on synthetic and real-world datasets demonstrates that the proposed method obtains the state-of-the-art performance.
Collapse
|
4
|
Wang L, Chen H, Peng B, Li T, Yin T. Robust multi-label feature selection with shared coupled and dynamic graph regularization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04343-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
5
|
Ouyang D, Miao R, Wang J, Liu X, Xie S, Ai N, Dang Q, Liang Y. Predicting Multiple Types of Associations Between miRNAs and Diseases Based on Graph Regularized Weighted Tensor Decomposition. Front Bioeng Biotechnol 2022; 10:911769. [PMID: 35910021 PMCID: PMC9335924 DOI: 10.3389/fbioe.2022.911769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Accepted: 05/04/2022] [Indexed: 11/23/2022] Open
Abstract
Many studies have indicated miRNAs lead to the occurrence and development of diseases through a variety of underlying mechanisms. Meanwhile, computational models can save time, minimize cost, and discover potential associations on a large scale. However, most existing computational models based on a matrix or tensor decomposition cannot recover positive samples well. Moreover, the high noise of biological similarity networks and how to preserve these similarity relationships in low-dimensional space are also challenges. To this end, we propose a novel computational framework, called WeightTDAIGN, to identify potential multiple types of miRNA–disease associations. WeightTDAIGN can recover positive samples well and improve prediction performance by weighting positive samples. WeightTDAIGN integrates more auxiliary information related to miRNAs and diseases into the tensor decomposition framework, focuses on learning low-rank tensor space, and constrains projection matrices by using the L2,1 norm to reduce the impact of redundant information on the model. In addition, WeightTDAIGN can preserve the local structure information in the biological similarity network by introducing graph Laplacian regularization. Our experimental results show that the sparser datasets, the more satisfactory performance of WeightTDAIGN can be obtained. Also, the results of case studies further illustrate that WeightTDAIGN can accurately predict the associations of miRNA–disease-type.
Collapse
Affiliation(s)
- Dong Ouyang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Rui Miao
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Jianjun Wang
- School of Mathematics and Statistics, Southwest University, Chongqing, China
| | - Xiaoying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China
| | - Shengli Xie
- Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou, China
| | - Ning Ai
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Qi Dang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Yong Liang
- Peng Cheng Laboratory, Shenzhen, China
- *Correspondence: Yong Liang,
| |
Collapse
|
6
|
Saberi-Movahed F, Mohammadifard M, Mehrpooya A, Rezaei-Ravari M, Berahmand K, Rostami M, Karami S, Najafzadeh M, Hajinezhad D, Jamshidi M, Abedi F, Mohammadifard M, Farbod E, Safavi F, Dorvash M, Mottaghi-Dastjerdi N, Vahedi S, Eftekhari M, Saberi-Movahed F, Alinejad-Rokny H, Band SS, Tavassoly I. Decoding clinical biomarker space of COVID-19: Exploring matrix factorization-based feature selection methods. Comput Biol Med 2022; 146:105426. [PMID: 35569336 PMCID: PMC8979841 DOI: 10.1016/j.compbiomed.2022.105426] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 03/01/2022] [Accepted: 03/18/2022] [Indexed: 02/06/2023]
Abstract
One of the most critical challenges in managing complex diseases like COVID-19 is to establish an intelligent triage system that can optimize the clinical decision-making at the time of a global pandemic. The clinical presentation and patients' characteristics are usually utilized to identify those patients who need more critical care. However, the clinical evidence shows an unmet need to determine more accurate and optimal clinical biomarkers to triage patients under a condition like the COVID-19 crisis. Here we have presented a machine learning approach to find a group of clinical indicators from the blood tests of a set of COVID-19 patients that are predictive of poor prognosis and morbidity. Our approach consists of two interconnected schemes: Feature Selection and Prognosis Classification. The former is based on different Matrix Factorization (MF)-based methods, and the latter is performed using Random Forest algorithm. Our model reveals that Arterial Blood Gas (ABG) O2 Saturation and C-Reactive Protein (CRP) are the most important clinical biomarkers determining the poor prognosis in these patients. Our approach paves the path of building quantitative and optimized clinical management systems for COVID-19 and similar diseases.
Collapse
Affiliation(s)
| | | | - Adel Mehrpooya
- School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia
| | | | - Kamal Berahmand
- School of Computer Science, Faculty of Science, Queensland University of Technology (QUT), Brisbane, Australia
| | - Mehrdad Rostami
- Center for Machine Vision and Signal Analysis (CMVS), University of Oulu, Oulu, Finland
| | - Saeed Karami
- Department of Mathematics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| | - Mohammad Najafzadeh
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | | | - Mina Jamshidi
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Farshid Abedi
- Infectious Diseases Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | | | - Elnaz Farbod
- Baruch College, City University of New York, New York, USA
| | - Farinaz Safavi
- Neuroimmunology and Neurovirology Branch, National Institute of Neurological Disorders and Stroke, National Institute of Health, Bethesda, MD, USA
| | - Mohammadreza Dorvash
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Viewbank, VIC, Australia
| | - Negar Mottaghi-Dastjerdi
- Department of Pharmacognosy and Pharmaceutical Biotechnology, School of Pharmacy, Iran University of Medical Sciences, Tehran, Iran
| | | | - Mahdi Eftekhari
- Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Farid Saberi-Movahed
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran,Corresponding author
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Shahab S. Band
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin, 64002, Taiwan
| | - Iman Tavassoly
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY10029, USA,Corresponding author
| |
Collapse
|
7
|
Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.02.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
8
|
Rodríguez-Domínguez U, Dalmau O. Symmetric nonnegative matrix factorization with elastic-net regularized block-wise weighted representation for clustering. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01062-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
9
|
Mokhtia M, Eftekhari M, Saberi-Movahed F. Dual-manifold regularized regression models for feature selection based on hesitant fuzzy correlation. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107308] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Dinh DT, Huynh VN, Sriboonchitta S. Clustering mixed numerical and categorical data with missing values. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.04.076] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
11
|
Saberi-Movahed F, Mohammadifard M, Mehrpooya A, Rezaei-Ravari M, Berahmand K, Rostami M, Karami S, Najafzadeh M, Hajinezhad D, Jamshidi M, Abedi F, Mohammadifard M, Farbod E, Safavi F, Dorvash M, Vahedi S, Eftekhari M, Saberi-Movahed F, Tavassoly I. Decoding Clinical Biomarker Space of COVID-19: Exploring Matrix Factorization-based Feature Selection Methods. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.07.07.21259699. [PMID: 34268522 PMCID: PMC8282111 DOI: 10.1101/2021.07.07.21259699] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
One of the most critical challenges in managing complex diseases like COVID-19 is to establish an intelligent triage system that can optimize the clinical decision-making at the time of a global pandemic. The clinical presentation and patients’ characteristics are usually utilized to identify those patients who need more critical care. However, the clinical evidence shows an unmet need to determine more accurate and optimal clinical biomarkers to triage patients under a condition like the COVID-19 crisis. Here we have presented a machine learning approach to find a group of clinical indicators from the blood tests of a set of COVID-19 patients that are predictive of poor prognosis and morbidity. Our approach consists of two interconnected schemes: Feature Selection and Prognosis Classification. The former is based on different Matrix Factorization (MF)-based methods, and the latter is performed using Random Forest algorithm. Our model reveals that Arterial Blood Gas (ABG) O 2 Saturation and C-Reactive Protein (CRP) are the most important clinical biomarkers determining the poor prognosis in these patients. Our approach paves the path of building quantitative and optimized clinical management systems for COVID-19 and similar diseases.
Collapse
Affiliation(s)
| | | | - Adel Mehrpooya
- School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia
| | | | - Kamal Berahmand
- School of Computer Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane Australia
| | | | - Saeed Karami
- Department of Mathematics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| | - Mohammad Najafzadeh
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | | | - Mina Jamshidi
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Farshid Abedi
- Infectious Diseases Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | | | - Elnaz Farbod
- Baruch College, City University of New York, New York, USA
| | - Farinaz Safavi
- Neuroimmunology and Neurovirology Branch, National Institute of Neurological Disorders and Stroke, National Institute of Health, Bethesda, Maryland, USA
| | - Mohammadreza Dorvash
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Viewbank, VIC, Australia
| | | | - Mahdi Eftekhari
- Department of Computer Engineering, University of Kerman, Kerman, Iran
| | - Farid Saberi-Movahed
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Iman Tavassoly
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY10029
| |
Collapse
|
12
|
Li S, Li W, Hu J, Li Y. Semi-supervised bi-orthogonal constraints dual-graph regularized NMF for subspace clustering. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02522-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Pulgar FJ, Charte F, Rivera AJ, del Jesus MJ. ClEnDAE: A classifier based on ensembles with built-in dimensionality reduction through denoising autoencoders. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
Xue C, Zhang T, Xiao D. Output-Related and -Unrelated Fault Monitoring with an Improvement Prototype Knockoff Filter and Feature Selection Based on Laplacian Eigen Maps and Sparse Regression. ACS OMEGA 2021; 6:10828-10839. [PMID: 34056237 PMCID: PMC8153765 DOI: 10.1021/acsomega.1c00506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 04/06/2021] [Indexed: 06/12/2023]
Abstract
In the process industry, fault monitoring related to output is an important step to ensure product quality and improve economic benefits. In order to distinguish the influence of input variables on the output more accurately, this paper introduces a subalgorithm of fault-unrelated block partition into the prototype knockoff filter (PKF) algorithm for its improvement. The improved PKF algorithm can divide the input data into three blocks: fault-unrelated block, output-related block, and output-unrelated block. Removing the data of fault-unrelated blocks can greatly reduce the difficulty of fault monitoring. This paper proposes a feature selection based on the Laplacian Eigen maps and sparse regression algorithm for output-unrelated blocks. The algorithm has the ability to detect faults caused by variables with small contribution to variance and proves the descent of the algorithm from a theoretical point of view. The output relation block is monitored by the Broyden-Fletcher-Goldfarb-Shanno method. Finally, the effectiveness of the proposed fault detection method is verified by the recognized Eastman process data in Tennessee.
Collapse
Affiliation(s)
- Cuiping Xue
- College
of Science, Northeastern University, Shenyang 110819, China
| | - Tie Zhang
- College
of Science, Northeastern University, Shenyang 110819, China
| | - Dong Xiao
- College
of Information Science and Engineering and Liaoning Key Laboratory
of Intelligent Diagnosis and Safety for Metallurgical Industry, Northeastern University, Shenyang 110819, China
| |
Collapse
|
15
|
Laohakiat S, Sa-ing V. An incremental density-based clustering framework using fuzzy local clustering. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.052] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Yu L, Cao F, Gao XZ, Liu J, Liang J. k-Mnv-Rep: A k-type clustering algorithm for matrix-object data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.06.071] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
17
|
Clustering and supervised response for XACML policy evaluation and management. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
18
|
Fletcher S, Verma B, Zhang M. A non-specialized ensemble classifier using multi-objective optimization. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.05.029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
19
|
|
20
|
Hu J, Li Y, Gao W, Zhang P. Robust multi-label feature selection with dual-graph regularization. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106126] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
21
|
Meng Y, Shang R, Shang F, Jiao L, Yang S, Stolkin R. Semi-Supervised Graph Regularized Deep NMF With Bi-Orthogonal Constraints for Data Representation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3245-3258. [PMID: 31603802 DOI: 10.1109/tnnls.2019.2939637] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Semi-supervised non-negative matrix factorization (NMF) exploits the strengths of NMF in effectively learning local information contained in data and is also able to achieve effective learning when only a small fraction of data is labeled. NMF is particularly useful for dimensionality reduction of high-dimensional data. However, the mapping between the low-dimensional representation, learned by semi-supervised NMF, and the original high-dimensional data contains complex hierarchical and structural information, which is hard to extract by using only single-layer clustering methods. Therefore, in this article, we propose a new deep learning method, called semi-supervised graph regularized deep NMF with bi-orthogonal constraints (SGDNMF). SGDNMF learns a representation from the hidden layers of a deep network for clustering, which contains varied and unknown attributes. Bi-orthogonal constraints on two factor matrices are introduced into our SGDNMF model, which can make the solution unique and improve clustering performance. This improves the effect of dimensionality reduction because it only requires a small fraction of data to be labeled. In addition, SGDNMF incorporates dual-hypergraph Laplacian regularization, which can reinforce high-order relationships in both data and feature spaces and fully retain the intrinsic geometric structure of the original data. This article presents the details of the SGDNMF algorithm, including the objective function and the iterative updating rules. Empirical experiments on four different data sets demonstrate state-of-the-art performance of SGDNMF in comparison with six other prominent algorithms.
Collapse
|
22
|
Ye Q, Zhang X, Sun Y. Dual Global Structure Preservation Based Supervised Feature Selection. Neural Process Lett 2020. [DOI: 10.1007/s11063-020-10225-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Do H, Cheon MS, Kim SB. Graph Structured Sparse Subset Selection. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.12.086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
24
|
Shang R, Song J, Jiao L, Li Y. Double feature selection algorithm based on low-rank sparse non-negative matrix factorization. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01079-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Machine learning integrated credibilistic semi supervised clustering for categorical data. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105871] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
26
|
Shang R, Xu K, Shang F, Jiao L. Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.07.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Supervised feature selection by constituting a basis for the original space of features and matrix factorization. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01046-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
28
|
|
29
|
|
30
|
|
31
|
Unsupervised feature selection based on kernel fisher discriminant analysis and regression learning. Mach Learn 2018. [DOI: 10.1007/s10994-018-5765-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|