101
|
Masood N, Farooq H. EEG electrodes selection for emotion recognition independent of stimulus presentation paradigms. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Most of the electroencephalography (EEG) based emotion recognition systems rely on single stimulus to evoke emotions. EEG data is mostly recorded with higher number of electrodes that can lead to data redundancy and longer experimental setup time. The question “whether the configuration with lesser number of electrodes is common amongst different stimuli presentation paradigms” remains unanswered. There are publicly available datasets for EEG based human emotional states recognition. Since this work is focused towards classifying emotions while subjects are experiencing different stimuli, therefore we need to perform new experiments. Keeping aforementioned issues in consideration, this work presents a novel experimental study that records EEG data for three different human emotional states evoked with four different stimuli presentation paradigms. A methodology based on iterative Genetic Algorithm in combination with majority voting has been used to achieve configuration with reduced number of EEG electrodes keeping in consideration minimum loss of classification accuracy. The results obtained are comparable with recent studies. Stimulus independent configurations with lesser number of electrodes lead towards low computational complexity as well as reduced set up time for future EEG based smart systems for emotions recognition
Collapse
Affiliation(s)
- Naveen Masood
- Electrical Engineering Department, BahriaUniversity, Karachi, Pakistan
| | - Humera Farooq
- Computer Science Department, Bahria University, Karachi, Pakistan
| |
Collapse
|
102
|
Bhadra T, Bandyopadhyay S. Supervised feature selection using integration of densest subgraph finding with floating forward–backward search. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
103
|
Feature Ranking and Differential Evolution for Feature Selection in Brushless DC Motor Fault Diagnosis. Symmetry (Basel) 2021. [DOI: 10.3390/sym13071291] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A fault diagnosis system with the ability to recognize many different faults obviously has a certain complexity. Therefore, improving the performance of similar systems has attracted much research interest. This article proposes a system of feature ranking and differential evolution for feature selection in BLDC fault diagnosis. First, this study used the Hilbert–Huang transform (HHT) to extract the features of four different types of brushless DC motor Hall signal. When there is a fault, the symmetry of the Hall signal will be influenced. Second, we used feature selection based on a distance discriminant (FSDD) to calculate the feature factors which base on the category separability of features to select the features which have a positive correlation with the types. The features were entered sequentially into the two supervised classifiers: backpropagation neural network (BPNN) and linear discriminant analysis (LDA), and the identification results were then evaluated. The feature input for the classifier was derived from the FSDD, and then we optimized the feature rank using differential evolution (DE). Finally, the results were verified from the BLDC motor’s operating environment simulation with the same features by adding appropriate signal-to-noise ratio magnitudes. The identification system obtained an accuracy rate of 96% when there were 14 features. Additionally, the experimental results show that the proposed system has a robust anti-noise ability, and the accuracy rate is 92.04%, even when 20 dB of white Gaussian noise is added to the signal. Moreover, compared with the systems established from the discrete wavelet transform (DWT) and a variety of classifiers, our proposed system has a higher accuracy with fewer features.
Collapse
|
104
|
Gu X, Guo J. A feature subset selection algorithm based on equal interval division and three-way interaction information. Soft comput 2021. [DOI: 10.1007/s00500-021-05800-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
105
|
Boubekki A, Kampffmeyer M, Brefeld U, Jenssen R. Joint optimization of an autoencoder for clustering and embedding. Mach Learn 2021. [DOI: 10.1007/s10994-021-06015-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractDeep embedded clustering has become a dominating approach to unsupervised categorization of objects with deep neural networks. The optimization of the most popular methods alternates between the training of a deep autoencoder and a k-means clustering of the autoencoder’s embedding. The diachronic setting, however, prevents the former to benefit from valuable information acquired by the latter. In this paper, we present an alternative where the autoencoder and the clustering are learned simultaneously. This is achieved by providing novel theoretical insight, where we show that the objective function of a certain class of Gaussian mixture models (GMM’s) can naturally be rephrased as the loss function of a one-hidden layer autoencoder thus inheriting the built-in clustering capabilities of the GMM. That simple neural network, referred to as the clustering module, can be integrated into a deep autoencoder resulting in a deep clustering model able to jointly learn a clustering and an embedding. Experiments confirm the equivalence between the clustering module and Gaussian mixture models. Further evaluations affirm the empirical relevance of our deep architecture as it outperforms related baselines on several data sets.
Collapse
|
106
|
Jiang Y, Yin S, Dong J, Kaynak O. A Review on Soft Sensors for Monitoring, Control, and Optimization of Industrial Processes. IEEE SENSORS JOURNAL 2021; 21:12868-12881. [DOI: 10.1109/jsen.2020.3033153] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2024]
|
107
|
Rahnavard A, Chatterjee S, Sayoldin B, Crandall KA, Tekola-Ayele F, Mallick H. Omics community detection using multi-resolution clustering. Bioinformatics 2021; 37:3588-3594. [PMID: 33974004 PMCID: PMC8545346 DOI: 10.1093/bioinformatics/btab317] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/23/2021] [Accepted: 04/26/2021] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION The discovery of biologically interpretable and clinically actionable communities in heterogeneous omics data is a necessary first step towards deriving mechanistic insights into complex biological phenomena. Here we present a novel clustering approach, omeClust, for community detection in omics profiles by simultaneously incorporating similarities among measurements and the overall complex structure of the data. RESULTS We show that omeClust outperforms published methods in inferring the true community structure as measured by both sensitivity and misclassification rate on simulated datasets. We further validated omeClust in diverse, multiple omics datasets, revealing new communities and functionally related groups in microbial strains, cell line gene expression patterns, and fetal genomic variation. We also derived enrichment scores attributable to putatively meaningful biological factors in these datasets that can serve as hypothesis generators facilitating new sets of testable hypotheses. AVAILABILITY omeClust is open-source software, and the implementation is available online at http://github.com/omicsEye/omeClust. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ali Rahnavard
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
| | - Suvo Chatterjee
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - Bahar Sayoldin
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| | - Keith A Crandall
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052, USA
| | - Fasil Tekola-Ayele
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD 20892, USA
| | - Himel Mallick
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
108
|
Medeiros J, Couceiro R, Duarte G, Durães J, Castelhano J, Duarte C, Castelo-Branco M, Madeira H, de Carvalho P, Teixeira C. Can EEG Be Adopted as a Neuroscience Reference for Assessing Software Programmers' Cognitive Load? SENSORS 2021; 21:s21072338. [PMID: 33801660 PMCID: PMC8037053 DOI: 10.3390/s21072338] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 03/20/2021] [Accepted: 03/25/2021] [Indexed: 11/16/2022]
Abstract
An emergent research area in software engineering and software reliability is the use of wearable biosensors to monitor the cognitive state of software developers during software development tasks. The goal is to gather physiologic manifestations that can be linked to error-prone scenarios related to programmers’ cognitive states. In this paper we investigate whether electroencephalography (EEG) can be applied to accurately identify programmers’ cognitive load associated with the comprehension of code with different complexity levels. Therefore, a controlled experiment involving 26 programmers was carried. We found that features related to Theta, Alpha, and Beta brain waves have the highest discriminative power, allowing the identification of code lines and demanding higher mental effort. The EEG results reveal evidence of mental effort saturation as code complexity increases. Conversely, the classic software complexity metrics do not accurately represent the mental effort involved in code comprehension. Finally, EEG is proposed as a reference, in particular, the combination of EEG with eye tracking information allows for an accurate identification of code lines that correspond to peaks of cognitive load, providing a reference to help in the future evaluation of the space and time accuracy of programmers’ cognitive state monitored using wearable devices compatible with software development activities.
Collapse
Affiliation(s)
- Júlio Medeiros
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
- Correspondence:
| | - Ricardo Couceiro
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
| | - Gonçalo Duarte
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
| | - João Durães
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
- Coimbra Polytechnic—ISEC, R. Pedro Nunes, P-3030-199 Coimbra, Portugal
| | - João Castelhano
- ICNAS-Institute of Nuclear Sciences Applied to Health, University of Coimbra, P-3000-548 Coimbra, Portugal; (J.C.); (C.D.); (M.C.-B.)
- CIBIT-Coimbra Institute for Biomedical Imaging and Translational Research, University of Coimbra, P-3000-548 Coimbra, Portugal
| | - Catarina Duarte
- ICNAS-Institute of Nuclear Sciences Applied to Health, University of Coimbra, P-3000-548 Coimbra, Portugal; (J.C.); (C.D.); (M.C.-B.)
- CIBIT-Coimbra Institute for Biomedical Imaging and Translational Research, University of Coimbra, P-3000-548 Coimbra, Portugal
| | - Miguel Castelo-Branco
- ICNAS-Institute of Nuclear Sciences Applied to Health, University of Coimbra, P-3000-548 Coimbra, Portugal; (J.C.); (C.D.); (M.C.-B.)
- CIBIT-Coimbra Institute for Biomedical Imaging and Translational Research, University of Coimbra, P-3000-548 Coimbra, Portugal
| | - Henrique Madeira
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
| | - Paulo de Carvalho
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
| | - César Teixeira
- Department of Informatics Engineering, CISUC-Centre for Informatics and Systems of the University of Coimbra, University of Coimbra, P-3030-790 Coimbra, Portugal; (R.C.); (G.D.); (J.D.); (H.M.); (P.d.C.); (C.T.)
| |
Collapse
|
109
|
Wang J, Zhang H, Wang J, Pu Y, Pal NR. Feature Selection Using a Neural Network With Group Lasso Regularization and Controlled Redundancy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1110-1123. [PMID: 32396104 DOI: 10.1109/tnnls.2020.2980383] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose a neural network-based feature selection (FS) scheme that can control the level of redundancy in the selected features by integrating two penalties into a single objective function. The Group Lasso penalty aims to produce sparsity in features in a grouped manner. The redundancy-control penalty, which is defined based on a measure of dependence among features, is utilized to control the level of redundancy among the selected features. Both the penalty terms involve the L2,1 -norm of weight matrix between the input and hidden layers. These penalty terms are nonsmooth at the origin, and hence, one simple but efficient smoothing technique is employed to overcome this issue. The monotonicity and convergence of the proposed algorithm are specified and proved under suitable assumptions. Then, extensive experiments are conducted on both artificial and real data sets. Empirical results explicitly demonstrate the ability of the proposed FS scheme and its effectiveness in controlling redundancy. The empirical simulations are observed to be consistent with the theoretical results.
Collapse
|
110
|
Albashish D, Hammouri AI, Braik M, Atwan J, Sahran S. Binary biogeography-based optimization based SVM-RFE for feature selection. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107026] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
111
|
|
112
|
Multi-modal brain image fusion based on multi-level edge-preserving filtering. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102280] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
113
|
Wang L, Chen P, Chen S, Sun M. A novel approach to fully representing the diversity in conditional dependencies for learning Bayesian network classifier. INTELL DATA ANAL 2021. [DOI: 10.3233/ida-194959] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Bayesian network classifiers (BNCs) have proved their effectiveness and efficiency in the supervised learning framework. Numerous variations of conditional independence assumption have been proposed to address the issue of NP-hard structure learning of BNC. However, researchers focus on identifying conditional dependence rather than conditional independence, and information-theoretic criteria cannot identify the diversity in conditional (in)dependencies for different instances. In this paper, the maximum correlation criterion and minimum dependence criterion are introduced to sort attributes and identify conditional independencies, respectively. The heuristic search strategy is applied to find possible global solution for achieving the trade-off between significant dependency relationships and independence assumption. Our extensive experimental evaluation on widely used benchmark data sets reveals that the proposed algorithm achieves competitive classification performance compared to state-of-the-art single model learners (e.g., TAN, KDB, KNN and SVM) and ensemble learners (e.g., ATAN and AODE).
Collapse
Affiliation(s)
- Limin Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Peng Chen
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| | - Shenglei Chen
- School of Economics, Nanjing Audit University, Nanjing, Jiangsu, China
| | - Minghui Sun
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China
| |
Collapse
|
114
|
Wang Q, Ding Z, Tao Z, Gao Q, Fu Y. Generative Partial Multi-View Clustering With Adaptive Fusion and Cycle Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:1771-1783. [PMID: 33417549 DOI: 10.1109/tip.2020.3048626] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Nowadays, with the rapid development of data collection sources and feature extraction methods, multi-view data are getting easy to obtain and have received increasing research attention in recent years, among which, multi-view clustering (MVC) forms a mainstream research direction and is widely used in data analysis. However, existing MVC methods mainly assume that each sample appears in all the views, without considering the incomplete view case due to data corruption, sensor failure, equipment malfunction, etc. In this study, we design and build a generative partial multi-view clustering model with adaptive fusion and cycle consistency, named as GP-MVC, to solve the incomplete multi-view problem by explicitly generating the data of missing views. The main idea of GP-MVC lies in two-fold. First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the shared cluster structure across multiple views. Second, view-specific generative adversarial networks with multi-view cycle consistency are developed to generate the missing data of one view conditioning on the shared representation given by other views. These two steps could be promoted mutually, where the learned common representation facilitates data imputation and the generated data could further explores the view consistency. Moreover, an weighted adaptive fusion scheme is implemented to exploit the complementary information among different views. Experimental results on four benchmark datasets are provided to show the effectiveness of the proposed GP-MVC over the state-of-the-art methods.
Collapse
|
115
|
Zhu X, Li J, Li HD, Xie M, Wang J. Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell. Front Genet 2020; 11:604790. [PMID: 33384718 PMCID: PMC7770236 DOI: 10.3389/fgene.2020.604790] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 11/23/2020] [Indexed: 01/23/2023] Open
Abstract
Clustering is an efficient way to analyze single-cell RNA sequencing data. It is commonly used to identify cell types, which can help in understanding cell differentiation processes. However, different clustering results can be obtained from different single-cell clustering methods, sometimes including conflicting conclusions, and biologists will often fail to get the right clustering results and interpret the biological significance. The cluster ensemble strategy can be an effective solution for the problem. As the graph partitioning-based clustering methods are good at clustering single-cell, we developed Sc-GPE, a novel cluster ensemble method combining five single-cell graph partitioning-based clustering methods. The five methods are SNN-cliq, PhenoGraph, SC3, SSNN-Louvain, and MPGS-Louvain. In Sc-GPE, a consensus matrix is constructed based on the five clustering solutions by calculating the probability that the cell pairs are divided into the same cluster. It solved the problem in the hypergraph-based ensemble approach, including the different cluster labels that were assigned in the individual clustering method, and it was difficult to find the corresponding cluster labels across all methods. Then, to distinguish the different importance of each method in a clustering ensemble, a weighted consensus matrix was constructed by designing an importance score strategy. Finally, hierarchical clustering was performed on the weighted consensus matrix to cluster cells. To evaluate the performance, we compared Sc-GPE with the individual clustering methods and the state-of-the-art SAME-clustering on 12 single-cell RNA-seq datasets. The results show that Sc-GPE obtained the best average performance, and achieved the highest NMI and ARI value in five datasets.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China.,Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jian Li
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Miao Xie
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Jianxin Wang
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
116
|
Relevance assignation feature selection method based on mutual information for machine learning. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106439] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
117
|
Amma NGB, Selvakumar S, Velusamy RL. A Statistical Approach for Detection of Denial of Service Attacks in Computer Networks. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2020. [DOI: 10.1109/tnsm.2020.3022799] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
118
|
M Salem OA, Liu F, Sherif AS, Zhang W, Chen X. Feature selection based on fuzzy joint mutual information maximization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2020; 18:305-327. [PMID: 33525093 DOI: 10.3934/mbe.2021016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Nowadays, real-world applications handle a huge amount of data, especially with high-dimension features space. These datasets are a significant challenge for classification systems. Unfortunately, most of the features present are irrelevant or redundant, thus making these systems inefficient and inaccurate. For this reason, many feature selection (FS) methods based on information theory have been introduced to improve the classification performance. However, the current methods have some limitations such as dealing with continuous features, estimating the redundancy relations, and considering the outer-class information. To overcome these limitations, this paper presents a new FS method, called Fuzzy Joint Mutual Information Maximization (FJMIM). The effectiveness of our proposed method is verified by conducting an experimental comparison with nine of conventional and state-of-the-art feature selection methods. Based on 13 benchmark datasets, experimental results confirm that our proposed method leads to promising improvement in classification performance and feature selection stability.
Collapse
Affiliation(s)
- Omar A M Salem
- School of Computer Science, Wuhan University, Wuhan 430072, China
- Faculty of Computers and Informatics, Suez Canal University, Ismailia 41522, Egypt
| | - Feng Liu
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Ahmed Sobhy Sherif
- Faculty of Computers and Informatics, Suez Canal University, Ismailia 41522, Egypt
| | - Wen Zhang
- College of informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan 430072, China
| |
Collapse
|
119
|
Wang Q, Lian H, Sun G, Gao Q, Jiao L. iCmSC: Incomplete Cross-Modal Subspace Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:305-317. [PMID: 33186106 DOI: 10.1109/tip.2020.3036717] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Cross-modal clustering aims to cluster the high-similar cross-modal data into one group while separating the dissimilar data. Despite the promising cross-modal methods have developed in recent years, existing state-of-the-arts cannot effectively capture the correlations between cross-modal data when encountering with incomplete cross-modal data, which can gravely degrade the clustering performance. To well tackle the above scenario, we propose a novel incomplete cross-modal clustering method that integrates canonical correlation analysis and exclusive representation, named incomplete Cross-modal Subspace Clustering (i.e., iCmSC). To learn a consistent subspace representation among incomplete cross-modal data, we maximize the intrinsic correlations among different modalities by deep canonical correlation analysis (DCCA), while an exclusive self-expression layer is proposed after the output layers of DCCA. We exploit a l1,2 -norm regularization in the learned subspace to make the learned representation more discriminative, which makes samples between different clusters mutually exclusive and samples among the same cluster attractive to each other. Meanwhile, the decoding networks are employed to reconstruct the feature representation, and further preserve the structural information among the original cross-modal data. To the end, we demonstrate the effectiveness of the proposed iCmSC via extensive experiments, which can justify that iCmSC achieves consistently large improvement compared with the state-of-the-arts.
Collapse
|
120
|
Liu X, Zhang G, Zhang Z. A novel hybrid feature selection and modified KNN prediction model for coal and gas outbursts. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-200937] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The feature selection of influencing factors of coal and gas outbursts is of great significance for presenting the most discriminative features and improving prediction performance of a classifier, the paper presents an effective hybrid feature selection and modified outbursts classifier framework which aims at solving exiting coal and gas outbursts prediction problems. First, a measurement standard based on maximum information coefficient(MIC) is employed to identify the wide correlations between two variables; Second, based on a ranking procedure using non-dominated sorting genetic algorithm(NSGAII), maximum relevance minimum redundancy(MRMR) algorithm is subsequently performed to find out candidate feature set highly related to the class label and uncorrelated with each other; Third, random forest(RF) is employed to search the optimal feature subset from the candidate feature set, then the optimal feature subset that influences the classification performance of coal and gas outbursts is obtained; Finally, an improved classifier model has been proposed that combines gradient boosting decision tree(GBDT) and k-nearest neighbor(KNN) for outbursts prediction. In the modified classifier model, the GBDT is utilized to assign different weights to features, then the weighted features are input into the KNN to verify the effectiveness of proposed method on coal and gas outbursts dataset. The experimental results conclude that our proposed scheme is effective in the number of feature and prediction accuracy when compared with other related state-of-the-art prediction models based on feature selection for coal and gas outbursts.
Collapse
Affiliation(s)
- Xuning Liu
- School of Mechanical Electronic & Information Engineering, China University of Mining and Technology, Beijing, China
- Department of Computer Engineering, Shijiazhuang University, Shijiazhuang, China
| | - Guoying Zhang
- School of Mechanical Electronic & Information Engineering, China University of Mining and Technology, Beijing, China
| | - Zixian Zhang
- School of Mechanical Electronic & Information Engineering, China University of Mining and Technology, Beijing, China
- School of Foreign University, Liaocheng University, Liaocheng, China
| |
Collapse
|
121
|
|
122
|
Community detection in complex networks using network embedding and gravitational search algorithm. J Intell Inf Syst 2020. [DOI: 10.1007/s10844-020-00625-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
123
|
Paul D, Das S. A Bayesian non‐parametric approach for automatic clustering with feature weighting. Stat (Int Stat Inst) 2020. [DOI: 10.1002/sta4.306] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Debolina Paul
- Indian Statistical Institute Kolkata West Bengal 700108 India
| | - Swagatam Das
- Electronics and Communication Sciences Unit Indian Statistical Institute Kolkata West Bengal 700108 India
| |
Collapse
|
124
|
Guillén Perales A, Liébana-Cabanillas F, Sánchez-Fernández J, Herrera LJ. Assessing university students' perception of academic quality using machine learning. APPLIED COMPUTING AND INFORMATICS 2020. [DOI: 10.1108/aci-06-2020-0003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
The aim of this research is to assess the influence of the underlying service quality variable, usually related to university students' perception of the educational experience. Another aspect analysed in this work is the development of a procedure to determine which variables are more significant to assess students' satisfaction.
Design/methodology/approach
In order to achieve both goals, a twofold methodology was approached. In the first phase of research, an assessment of the service quality was performed with data gathered from 580 students in a process involving the adaptation of the SERVQUAL scale through a multi-objective optimization methodology. In the second phase of research, results obtained from students were compared with those obtained from the teaching staff at the university.
Findings
Results from the analysis revealed the most significant service quality dimensions from the students' viewpoint according to the scores that they provided. Comparison of the results with the teaching staff showed noticeable differences when assessing academic quality.
Originality/value
Significant conclusions can be drawn from the theoretical review of the empirical evidences obtained through this study helping with the practical design and implementation of quality strategies in higher education especially in regard to university education.
Collapse
|
125
|
Kamimura R. Cost-conscious mutual information maximization for improving collective interpretation of multi-layered neural networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
126
|
Guo J, Jin M, Chen Y, Liu J. An embedded gene selection method using knockoffs optimizing neural network. BMC Bioinformatics 2020; 21:414. [PMID: 32962627 PMCID: PMC7510330 DOI: 10.1186/s12859-020-03717-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 08/19/2020] [Indexed: 11/30/2022] Open
Abstract
Background Gene selection refers to find a small subset of discriminant genes from the gene expression profiles. How to select genes that affect specific phenotypic traits effectively is an important research work in the field of biology. The neural network has better fitting ability when dealing with nonlinear data, and it can capture features automatically and flexibly. In this work, we propose an embedded gene selection method using neural network. The important genes can be obtained by calculating the weight coefficient after the training is completed. In order to solve the problem of black box of neural network and further make the training results interpretable in neural network, we use the idea of knockoffs to construct the knockoff feature genes of the original feature genes. This method not only make each feature gene to compete with each other, but also make each feature gene compete with its knockoff feature gene. This approach can help to select the key genes that affect the decision-making of neural networks. Results We use maize carotenoids, tocopherol methyltransferase, raffinose family oligosaccharides and human breast cancer dataset to do verification and analysis. Conclusions The experiment results demonstrate that the knockoffs optimizing neural network method has better detection effect than the other existing algorithms, and specially for processing the nonlinear gene expression and phenotype data.
Collapse
Affiliation(s)
- Juncheng Guo
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.,Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 10049, China.,School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 10049, China
| | - Min Jin
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuanyuan Chen
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China. .,National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
127
|
Research on Image Adaptive Enhancement Algorithm under Low Light in License Plate Recognition System. Symmetry (Basel) 2020. [DOI: 10.3390/sym12091552] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The traffic block port monitors and manages the road traffic by shooting and recording the motor vehicles. However, due to the complex factors such as shooting angle, light condition, environmental background, etc., the recognition rate of license plate is not high enough. High light and low light under complex lighting conditions are symmetry problems. This paper analyzes and solves the low light problem in detail, an image adaptive enhancement algorithm under low light conditions is proposed in the paper. The algorithm mainly includes four modules, among which, the fast image classification module uses the deep and separable convolutional neural network to classify low-light images into low-light images by day and low-light images by night, greatly reducing the computation burden on the basis of ensuring the classification accuracy. The image enhancement module inputs the classified images into two different image enhancement algorithms and adopts the idea of dividing and ruling; the image quality evaluation module adopts a weighted comprehensive evaluation index. The final experiment shows that the comprehensive evaluation indexes are all greater than 0.83, which can improve the subsequent recognition of vehicle face and license plate.
Collapse
|
128
|
|
129
|
Tan H, Wang G, Wang W, Zhang Z. Feature selection based on distance correlation: a filter algorithm. J Appl Stat 2020; 49:411-426. [DOI: 10.1080/02664763.2020.1815672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Hongwei Tan
- School of Computer and Information Science, Southwest University, Chongqing, People's Republic of China
- School of Mathematics and Statistics, GuiZhou University of Finance and Economics, Guiyang, People's Republic of China
| | - Guodong Wang
- School of Computer and Information Science, Southwest University, Chongqing, People's Republic of China
| | - Wendong Wang
- School of Computer and Information Science, Southwest University, Chongqing, People's Republic of China
| | - Zili Zhang
- School of Computer and Information Science, Southwest University, Chongqing, People's Republic of China
- School of Information Technology, Deakin University, Geelong, Australia
| |
Collapse
|
130
|
Herrera LJ, Todero Peixoto CJ, Baños O, Carceller JM, Carrillo F, Guillén A. Composition Classification of Ultra-High Energy Cosmic Rays. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E998. [PMID: 33286767 PMCID: PMC7597327 DOI: 10.3390/e22090998] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/01/2020] [Accepted: 09/04/2020] [Indexed: 11/17/2022]
Abstract
The study of cosmic rays remains as one of the most challenging research fields in Physics. From the many questions still open in this area, knowledge of the type of primary for each event remains as one of the most important issues. All of the cosmic rays observatories have been trying to solve this question for at least six decades, but have not yet succeeded. The main obstacle is the impossibility of directly detecting high energy primary events, being necessary to use Monte Carlo models and simulations to characterize generated particles cascades. This work presents the results attained using a simulated dataset that was provided by the Monte Carlo code CORSIKA, which is a simulator of high energy particles interactions with the atmosphere, resulting in a cascade of secondary particles extending for a few kilometers (in diameter) at ground level. Using this simulated data, a set of machine learning classifiers have been designed and trained, and their computational cost and effectiveness compared, when classifying the type of primary under ideal measuring conditions. Additionally, a feature selection algorithm has allowed for identifying the relevance of the considered features. The results confirm the importance of the electromagnetic-muonic component separation from signal data measured for the problem. The obtained results are quite encouraging and open new work lines for future more restrictive simulations.
Collapse
Affiliation(s)
- Luis Javier Herrera
- Computer Architecture and Technology Department, University of Granada, 18071 Granada, Spain; (O.B.); (F.C.); (A.G.)
| | | | - Oresti Baños
- Computer Architecture and Technology Department, University of Granada, 18071 Granada, Spain; (O.B.); (F.C.); (A.G.)
| | - Juan Miguel Carceller
- Theoretical and Cosmos Physics Department, University of Granada, 18071 Granada, Spain;
| | - Francisco Carrillo
- Computer Architecture and Technology Department, University of Granada, 18071 Granada, Spain; (O.B.); (F.C.); (A.G.)
| | - Alberto Guillén
- Computer Architecture and Technology Department, University of Granada, 18071 Granada, Spain; (O.B.); (F.C.); (A.G.)
| |
Collapse
|
131
|
Nie F, Wu D, Wang R, Li X. Self-Weighted Clustering With Adaptive Neighbors. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3428-3441. [PMID: 32011264 DOI: 10.1109/tnnls.2019.2944565] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Many modern clustering models can be divided into two separated steps, i.e., constructing a similarity graph (SG) upon samples and partitioning each sample into the corresponding cluster based on SG. Therefore, learning a reasonable SG has become a hot issue in the clustering field. Many previous works that focus on constructing better SG have been proposed. However, most of them follow an ideal assumption that the importance of different features is equal, which is not adapted in practical applications. To alleviate this problem, this article proposes a self-weighted clustering with adaptive neighbors (SWCAN) model that can assign weights for different features, learn an SG, and partition samples into clusters simultaneously. In experiments, we observe that the SWCAN can assign weights for different features reasonably and outperform than comparison clustering models on synthetic and practical data sets.
Collapse
|
132
|
DeYoreo M, Reiter JP. Bayesian Mixture Modeling for Multivariate Conditional Distributions. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2020. [DOI: 10.1007/s42519-020-00109-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
133
|
Spatial Spectral Band Selection for Enhanced Hyperspectral Remote Sensing Classification Applications. J Imaging 2020; 6:jimaging6090087. [PMID: 34460744 PMCID: PMC8321067 DOI: 10.3390/jimaging6090087] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 11/16/2022] Open
Abstract
Despite the numerous band selection (BS) algorithms reported in the field, most if not all have exhibited maximal accuracy when more spectral bands are utilized for classification. This apparently disagrees with the theoretical model of the ‘curse of dimensionality’ phenomenon, without apparent explanations. If it were true, then BS would be deemed as an academic piece of research without real benefits to practical applications. This paper presents a spatial spectral mutual information (SSMI) BS scheme that utilizes a spatial feature extraction technique as a preprocessing step, followed by the clustering of the mutual information (MI) of spectral bands for enhancing the efficiency of the BS. Through the SSMI BS scheme, a sharp ’bell’-shaped accuracy-dimensionality characteristic that peaks at about 20 bands has been observed for the very first time. The performance of the proposed SSMI BS scheme has been validated through 6 hyperspectral imaging (HSI) datasets (Indian Pines, Botswana, Barrax, Pavia University, Salinas, and Kennedy Space Center (KSC)), and its classification accuracy is shown to be approximately 10% better than seven state-of-the-art BS schemes (Saliency, HyperBS, SLN, OCF, FDPC, ISSC, and Convolution Neural Network (CNN)). The present result confirms that the high efficiency of the BS scheme is essentially important to observe and validate the Hughes’ phenomenon in the analysis of HSI data. Experiments also show that the classification accuracy can be affected by as much as approximately 10% when a single ‘crucial’ band is included or missed out for classification.
Collapse
|
134
|
Wang Y, Wang D, Pang W, Miao C, Tan AH, Zhou Y. A systematic density-based clustering method using anchor points. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
135
|
|
136
|
Kamimura R. Minimum interpretation by autoencoder-based serial and enhanced mutual information production. APPL INTELL 2020. [DOI: 10.1007/s10489-019-01619-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
137
|
Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics 2020; 112:4370-4384. [PMID: 32717320 DOI: 10.1016/j.ygeno.2020.07.027] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 06/22/2020] [Accepted: 07/14/2020] [Indexed: 01/19/2023]
Abstract
In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale medical datasets. On the other, medical applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the disease diagnosis and reduce its computational complexity. In this paper, a novel PSO-based multi objective feature selection method is proposed. The proposed method consists of three main phases. In the first phase, the original features are showed as a graph representation model. In the next phase, feature centralities for all nodes in the graph are calculated, and finally, in the third phase, an improved PSO-based search process is utilized to final feature selection. The results on five medical datasets indicate that the proposed method improves previous related methods in terms of efficiency and effectiveness.
Collapse
|
138
|
Abstract
AbstractIn image-based medical decision-making, different modalities of medical images of a given organ of a patient are captured. Each of these images will represent a modality that will render the examined organ differently, leading to different observations of a given phenomenon (such as stroke). The accurate analysis of each of these modalities promotes the detection of more appropriate medical decisions. Multimodal medical imaging is a research field that consists in the development of robust algorithms that can enable the fusion of image information acquired by different sets of modalities. In this paper, a novel multimodal medical image fusion algorithm is proposed for a wide range of medical diagnostic problems. It is based on the application of a boundary measured pulse-coupled neural network fusion strategy and an energy attribute fusion strategy in a non-subsampled shearlet transform domain. Our algorithm was validated in dataset with modalities of several diseases, namely glioma, Alzheimer’s, and metastatic bronchogenic carcinoma, which contain more than 100 image pairs. Qualitative and quantitative evaluation verifies that the proposed algorithm outperforms most of the current algorithms, providing important ideas for medical diagnosis.
Collapse
|
139
|
Feature selection via normative fuzzy information weight with application into tumor classification. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106299] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
140
|
Uzma, Al-Obeidat F, Tubaishat A, Shah B, Halim Z. Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05101-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
141
|
Abstract
Soft Sensors (SSs) are inferential models used in many industrial fields. They allow for real-time estimation of hard-to-measure variables as a function of available data obtained from online sensors. SSs are generally built using industries historical databases through data-driven approaches. A critical issue in SS design concerns the selection of input variables, among those available in a candidate dataset. In the case of industrial processes, candidate inputs can reach great numbers, making the design computationally demanding and leading to poorly performing models. An input selection procedure is then necessary. Most used input selection approaches for SS design are addressed in this work and classified with their benefits and drawbacks to guide the designer through this step.
Collapse
|
142
|
Wang R, Zhu Y, Chang CC, Peng Q. Privacy-preserving high-dimensional data publishing for classification. Comput Secur 2020. [DOI: 10.1016/j.cose.2020.101785] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
143
|
Yamada T, Prügel‐Bennett A, Thornton B. Learning features from georeferenced seafloor imagery with location guided autoencoders. J FIELD ROBOT 2020. [DOI: 10.1002/rob.21961] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Takaki Yamada
- Centre for In Situ and Remote Intelligent Sensing, Faculty of Engineering and Physical Science, School of Engineering University of Southampton Southampton UK
| | - Adam Prügel‐Bennett
- Faculty of Engineering and Physical Science, School of Electronics and Computer Science University of Southampton Southampton UK
| | - Blair Thornton
- Centre for In Situ and Remote Intelligent Sensing, Faculty of Engineering and Physical Science, School of Engineering University of Southampton Southampton UK
- Institute of Industrial Science The University of Tokyo Tokyo Japan
| |
Collapse
|
144
|
A Novel Accurate and Fast Converging Deep Learning-Based Model for Electrical Energy Consumption Forecasting in a Smart Grid. ENERGIES 2020. [DOI: 10.3390/en13092244] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Energy consumption forecasting is of prime importance for the restructured environment of energy management in the electricity market. Accurate energy consumption forecasting is essential for efficient energy management in the smart grid (SG); however, the energy consumption pattern is non-linear with a high level of uncertainty and volatility. Forecasting such complex patterns requires accurate and fast forecasting models. In this paper, a novel hybrid electrical energy consumption forecasting model is proposed based on a deep learning model known as factored conditional restricted Boltzmann machine (FCRBM). The deep learning-based FCRBM model uses a rectified linear unit (ReLU) activation function and a multivariate autoregressive technique for the network training. The proposed model predicts future electrical energy consumption for efficient energy management in the SG. The proposed model is a novel hybrid model comprising four modules: (i) data processing and features selection module, (ii) deep learning-based FCRBM forecasting module, (iii) genetic wind driven optimization (GWDO) algorithm-based optimization module, and (iv) utilization module. The proposed hybrid model, called FS-FCRBM-GWDO, is tested and evaluated on real power grid data of USA in terms of four performance metrics: mean absolute percentage deviation (MAPD), variance, correlation coefficient, and convergence rate. Simulation results validate that the proposed hybrid FS-FCRBM-GWDO model has superior performance than existing models such as accurate fast converging short-term load forecasting (AFC-STLF) model, mutual information-modified enhanced differential evolution algorithm-artificial neural network (MI-mEDE-ANN)-based model, features selection-ANN (FS-ANN)-based model, and Bi-level model, in terms of forecast accuracy and convergence rate.
Collapse
|
145
|
Bai X, Zhu L, Liang C, Li J, Nie X, Chang X. Multi-view feature selection via Nonnegative Structured Graph Learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.044] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
146
|
Dong B, Jian S, Zuo K. CDE++: Learning Categorical Data Embedding by Enhancing Heterogeneous Feature Value Coupling Relationships. ENTROPY (BASEL, SWITZERLAND) 2020; 22:e22040391. [PMID: 33286165 PMCID: PMC7516865 DOI: 10.3390/e22040391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 03/21/2020] [Accepted: 03/27/2020] [Indexed: 06/12/2023]
Abstract
Categorical data are ubiquitous in machine learning tasks, and the representation of categorical data plays an important role in the learning performance. The heterogeneous coupling relationships between features and feature values reflect the characteristics of the real-world categorical data which need to be captured in the representations. The paper proposes an enhanced categorical data embedding method, i.e., CDE++, which captures the heterogeneous feature value coupling relationships into the representations. Based on information theory and the hierarchical couplings defined in our previous work CDE (Categorical Data Embedding by learning hierarchical value coupling), CDE++ adopts mutual information and margin entropy to capture feature couplings and designs a hybrid clustering strategy to capture multiple types of feature value clusters. Moreover, Autoencoder is used to learn non-linear couplings between features and value clusters. The categorical data embeddings generated by CDE++ are low-dimensional numerical vectors which are directly applied to clustering and classification and achieve the best performance comparing with other categorical representation learning methods. Parameter sensitivity and scalability tests are also conducted to demonstrate the superiority of CDE++.
Collapse
|
147
|
Rajab M, Wang D. Practical Challenges and Recommendations of Filter Methods for Feature Selection. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1142/s0219649220400195] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Feature selection, the process of identifying relevant features to be incorporated into a proposed model, is one of the significant steps of the learning process. It removes noise from the data to increase the learning performance while reducing the computational complexity. The literature review indicated that most previous studies had focused on improving the overall classifier performance or reducing costs associated with training time during building of the classifiers. However, in this era of big data, there is an urgent need to deal with more complex issues that makes feature selection, especially using filter-based methods, more challenging; this in terms of dimensionality, data structures, data format, domain experts’ availability, data sparsity, and result discrepancies, among others. Filter methods identify the informative features of a given dataset to establish various predictive models using mathematical models. This paper takes a new route in an attempt to pinpoint recent practical challenges associated with filter methods and discusses potential areas of development to yield better performance. Several practical recommendations, based on recent studies, are made to overcome the identified challenges and make the feature selection process simpler and more efficient.
Collapse
Affiliation(s)
- Mohammed Rajab
- Department of Computer Science, The University of Sheffield, Sheffield, UK
| | - Dennis Wang
- Department of Computer Science, The University of Sheffield, Sheffield, UK
- Sheffield Institute for Translational Neuroscience, Sheffield, UK
- NIHR Sheffield Biomedical Research Centre, Sheffield, UK
| |
Collapse
|
148
|
Using weighted k-means to identify Chinese leading venture capital firms incorporating with centrality measures. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2019.102083] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
149
|
Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning. Interdiscip Sci 2020; 12:117-130. [PMID: 32086753 DOI: 10.1007/s12539-019-00357-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 12/23/2019] [Accepted: 12/26/2019] [Indexed: 12/22/2022]
Abstract
Clustering of single-cell RNA sequencing (scRNA-seq) data enables discovering cell subtypes, which is helpful for understanding and analyzing the processes of diseases. Determining the weight of edges is an essential component in graph-based clustering methods. While several graph-based clustering algorithms for scRNA-seq data have been proposed, they are generally based on k-nearest neighbor (KNN) and shared nearest neighbor (SNN) without considering the structure information of graph. Here, to improve the clustering accuracy, we present a novel method for single-cell clustering, called structural shared nearest neighbor-Louvain (SSNN-Louvain), which integrates the structure information of graph and module detection. In SSNN-Louvain, based on the distance between a node and its shared nearest neighbors, the weight of edge is defined by introducing the ratio of the number of the shared nearest neighbors to that of nearest neighbors, thus integrating structure information of the graph. Then, a modified Louvain community detection algorithm is proposed and applied to identify modules in the graph. Essentially, each community represents a subtype of cells. It is worth mentioning that our proposed method integrates the advantages of both SNN graph and community detection without the need for tuning any additional parameter other than the number of neighbors. To test the performance of SSNN-Louvain, we compare it to five existing methods on 16 real datasets, including nonnegative matrix factorization, single-cell interpretation via multi-kernel learning, SNN-Cliq, Seurat and PhenoGraph. The experimental results show that our approach achieves the best average performance in these datasets.
Collapse
|
150
|
Cheng L, Wang Y, Ma X. An end-to-end distance measuring for mixed data based on deep relevance learning. INTELL DATA ANAL 2020. [DOI: 10.3233/ida-184399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Li Cheng
- Science and Technology on Parallel and Distributed Laboratory College of Computer, National University of Defense Technology, Changsha, Hunan, China
| | - Yijie Wang
- Science and Technology on Parallel and Distributed Laboratory College of Computer, National University of Defense Technology, Changsha, Hunan, China
| | - Xingkong Ma
- College of Computer, National University of Defense Technology, Changsha, Hunan, China
| |
Collapse
|