151
|
Nayak SK, Rout PK, Jagadev AK, Swarnkar T. Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES 2020. [DOI: 10.1016/j.jksuci.2017.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
152
|
Siddiqui N, Chan RHM. Multimodal hand gesture recognition using single IMU and acoustic measurements at wrist. PLoS One 2020; 15:e0227039. [PMID: 31929544 PMCID: PMC6957149 DOI: 10.1371/journal.pone.0227039] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 12/10/2019] [Indexed: 11/30/2022] Open
Abstract
To facilitate hand gesture recognition, we investigated the use of acoustic signals with an accelerometer and gyroscope at the human wrist. As a proof-of-concept, the prototype consisted of 10 microphone units in contact with the skin placed around the wrist along with an inertial measurement unit (IMU). The gesture recognition performance was evaluated through the identification of 13 gestures used in daily life. The optimal area for acoustic sensor placement at the wrist was examined using the minimum redundancy and maximum relevance feature selection algorithm. We recruited 10 subjects to perform over 10 trials for each set of hand gestures. The accuracy was 75% for a general model with the top 25 features selected, and the intra-subject average classification accuracy was over 80% with the same features using one microphone unit at the mid-anterior wrist and an IMU. These results indicate that acoustic signatures from the human wrist can aid IMU sensing for hand gesture recognition, and the selection of a few common features for all subjects could help with building a general model. The proposed multimodal framework helps address the single IMU sensing bottleneck for hand gestures during arm movement and/or locomotion.
Collapse
Affiliation(s)
- Nabeel Siddiqui
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Rosa H. M. Chan
- Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
- * E-mail:
| |
Collapse
|
153
|
Feature selection with Symmetrical Complementary Coefficient for quantifying feature interactions. APPL INTELL 2020. [DOI: 10.1007/s10489-019-01518-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
154
|
Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7302551 DOI: 10.1007/978-3-030-50426-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previous methods which use mutual information and greedy iteration with a loss function to rank the features, we rank features according to their distribution similarities in two classes measured by relative entropy, and then remove the high redundant features from the sorted feature subsets. Experimental results on datasets in varieties of fields with different classifiers highlight the value of mDSRR on selecting feature subsets, especially so for choosing small size feature subset. mDSRR is also proved to outperform other state-of-the-art methods in most cases. Besides, we observed that the mutual information may not be a good practice to select the initial feature in the methods with subsequent iterations.
Collapse
|
155
|
|
156
|
Sun K, Tian P, Qi H, Ma F, Yang G. An Improved Normalized Mutual Information Variable Selection Algorithm for Neural Network-Based Soft Sensors. SENSORS 2019; 19:s19245368. [PMID: 31817459 PMCID: PMC6960561 DOI: 10.3390/s19245368] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 11/24/2019] [Accepted: 12/02/2019] [Indexed: 11/28/2022]
Abstract
In this paper, normalized mutual information feature selection (NMIFS) and tabu search (TS) are integrated to develop a new variable selection algorithm for soft sensors. NMIFS is applied to select influential variables contributing to the output variable and avoids selecting redundant variables by calculating mutual information (MI). A TS based strategy is designed to prevent NMIFS from falling into a local optimal solution. The proposed algorithm performs the variable selection by combining the entropy information and MI and validating error information of artificial neural networks (ANNs); therefore, it has advantages over previous MI-based variable selection algorithms. Several simulation datasets with different scales, correlations and noise parameters are implemented to demonstrate the performance of the proposed algorithm. A set of actual production data from a power plant is also used to check the performance of these algorithms. The experiments showed that the developed variable selection algorithm presents better model accuracy with fewer selected variables, compared with other state-of-the-art methods. The application of this algorithm to soft sensors can achieve reliable results.
Collapse
Affiliation(s)
- Kai Sun
- School of Electrical Engineering and Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China (F.M.)
- Correspondence: (K.S.); (G.Y.); Tel.: +86-15269190537 (K.S.); +86-13651869523 (G.Y.)
| | - Pengxin Tian
- School of Electrical Engineering and Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China (F.M.)
| | - Huanning Qi
- School of Electrical Engineering and Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China (F.M.)
| | - Fengying Ma
- School of Electrical Engineering and Automation, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China (F.M.)
| | - Genke Yang
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
- Ningbo Artificial Intelligence Institute, Shanghai Jiao Tong University, Ningbo 315000, China
- Correspondence: (K.S.); (G.Y.); Tel.: +86-15269190537 (K.S.); +86-13651869523 (G.Y.)
| |
Collapse
|
157
|
QUADRIVEN: A Framework for Qualitative Taxi Demand Prediction Based on Time-Variant Online Social Network Data Analysis. SENSORS 2019; 19:s19224882. [PMID: 31717423 PMCID: PMC6891530 DOI: 10.3390/s19224882] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 10/29/2019] [Accepted: 11/06/2019] [Indexed: 11/16/2022]
Abstract
Road traffic pollution is one of the key factors affecting urban air quality. There is a consensus in the community that the efficient use of public transport is the most effective solution. In that sense, much effort has been made in the data mining discipline to come up with solutions able to anticipate taxi demands in a city. This helps to optimize the trips made by such an important urban means of transport. However, most of the existing solutions in the literature define the taxi demand prediction as a regression problem based on historical taxi records. This causes serious limitations with respect to the required data to operate and the interpretability of the prediction outcome. In this paper, we introduce QUADRIVEN (QUalitative tAxi Demand pRediction based on tIme-Variant onlinE social Network data analysis), a novel approach to deal with the taxi demand prediction problem based on human-generated data widely available on online social networks. The result of the prediction is defined on the basis of categorical labels that allow obtaining a semantically-enriched output. Finally, this proposal was tested with different models in a large urban area, showing quite promising results with an F1 score above 0.8.
Collapse
|
158
|
Gu X, Guo J, Xiao L, Ming T, Li C. A Feature Selection Algorithm Based on Equal Interval Division and Minimal-Redundancy–Maximal-Relevance. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10144-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
159
|
Cheng L, Wang Y, Ma X. A Neural Probabilistic outlier detection method for categorical data. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.07.069] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
160
|
Jin D, Li R, Xu J. Multiscale Community Detection in Functional Brain Networks Constructed Using Dynamic Time Warping. IEEE Trans Neural Syst Rehabil Eng 2019; 28:52-61. [PMID: 31634138 DOI: 10.1109/tnsre.2019.2948055] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Previous studies have focused on the detection of community structures of brain networks constructed with resting-state functional magnetic resonance imaging (fMRI) data. Pearson correlation is often used to describe the connections between nodes in the construction of functional brain networks, which typically ignores the inherent timing and validity of fMRI time series. To solve this problem, this study applied the Dynamic Time Warp (DTW) algorithm to determine the correlation between two brain regions by comparing the synchronization and asynchrony of the time series. In addition, to determine the best community structure for each subject, we further divided the brain network into different scales, and then detected the different communities in these brain networks by using Modularity, Variation of Information (VI) and Normalized Mutual Information (NMI) as structural monitoring variables. Finally, we affirmed each subject's best community structure based on them. The experiments showed that through the method proposed in this paper, we not only accurately discovered important components of seven basic functional subnetworks, but also found that the putamen and Heschl's gyrus have a relationship with the inferior parietal network. Most importantly, this method can also determine each subject's functional brain network density, thus confirming the findings of studies testing real brain networks.
Collapse
|
161
|
Chen J, Wu Z, Zhang J, Li F. Mutual information-based dropout: Learning deep relevant feature representation architectures. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.04.090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
162
|
Tian W, Ren Y, Dong Y, Wang S, Bu L. Fault monitoring based on mutual information feature engineering modeling in chemical process. Chin J Chem Eng 2019. [DOI: 10.1016/j.cjche.2018.11.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
163
|
Analysis of Ship Detection Performance with Full-, Compact- and Dual-Polarimetric SAR. REMOTE SENSING 2019. [DOI: 10.3390/rs11182160] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Polarimetric synthetic aperture radar (SAR) is currently drawing more attention due to its advantage in Earth observations, especially in ship detection. In order to establish a reliable feature selection method for marine vessel monitoring purposes, forty features are extracted via polarimetric decomposition in the full-polarimetric (FP), compact-polarimetric (CP), and dual-polarimetric (DP) modes. These features were comprehensively quantified and evaluated using the Euclidean distance and mutual information, and the result indicated that the features in CP SAR are better than those of FP or DP SAR in general. The CP SAR features are thus further studied, and a new feature, named phase factor, in CP SAR mode is presented that can distinguish ships and the sea surface by the constant 0 without complex calculation. Furthermore, the phase factor is independent of the sea surface roughness, and hence it performs stably for ship detection even in high sea states. Experiments demonstrated that the ship detection performance of the phase factor detector is better than that of roundness, delta, HESA and CFAR detectors in low, medium and high sea states.
Collapse
|
164
|
Hosseini ES, Moattar MH. Evolutionary feature subsets selection based on interaction information for high dimensional imbalanced data classification. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105581] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
165
|
Tsakiridis NL, Theocharis JB, Panagos P, Zalidis GC. An evolutionary fuzzy rule-based system applied to the prediction of soil organic carbon from soil spectral libraries. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105504] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
166
|
Improved Subspace Detection Based on Minimum Noise Fraction and Mutual Information for Hyperspectral Image Classification. ACTA ACUST UNITED AC 2019. [DOI: 10.1007/978-981-13-7564-4_53] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
|
167
|
Feature selection for intrusion detection using new multi-objective estimation of distribution algorithms. APPL INTELL 2019. [DOI: 10.1007/s10489-019-01503-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
168
|
Classification of high dimensional biomedical data based on feature selection using redundant removal. PLoS One 2019; 14:e0214406. [PMID: 30964868 PMCID: PMC6456288 DOI: 10.1371/journal.pone.0214406] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 03/12/2019] [Indexed: 11/26/2022] Open
Abstract
High dimensional biomedical data contain tens of thousands of features, accurate and effective identification of the core features in these data can be used to assist diagnose related diseases. However, there are often a large number of irrelevant or redundant features in biomedical data, which seriously affect subsequent classification accuracy and machine learning efficiency. To solve this problem, a novel filter feature selection algorithm based on redundant removal (FSBRR) is proposed to classify high dimensional biomedical data in this paper. First of all, two redundant criteria are determined by vertical relevance (the relationship between feature and class attribute) and horizontal relevance (the relationship between feature and feature). Secondly, to quantify redundant criteria, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. To evaluate the effectiveness of our proposed algorithm, controlled trials based on typical feature selection algorithm are conducted using three different classifiers, and the experimental results indicate that the FSBRR algorithm can effectively reduce the feature dimension and improve the classification accuracy. In addition, an experiment of small sample dataset is designed and conducted in the section of discussion and analysis to clarify the specific implementation process of FSBRR algorithm more clearly.
Collapse
|
169
|
|
170
|
Abstract
Abstract
Nowadays, being in digital era the data generated by various applications are increasing drastically both row-wise and column wise; this creates a bottleneck for analytics and also increases the burden of machine learning algorithms that work for pattern recognition. This cause of dimensionality can be handled through reduction techniques. The Dimensionality Reduction (DR) can be handled in two ways namely Feature Selection (FS) and Feature Extraction (FE). This paper focuses on a survey of feature selection methods, from this extensive survey we can conclude that most of the FS methods use static data. However, after the emergence of IoT and web-based applications, the data are generated dynamically and grow in a fast rate, so it is likely to have noisy data, it also hinders the performance of the algorithm. With the increase in the size of the data set, the scalability of the FS methods becomes jeopardized. So the existing DR algorithms do not address the issues with the dynamic data. Using FS methods not only reduces the burden of the data but also avoids overfitting of the model.
Collapse
|
171
|
Gu X, Guo J, Wei H, He Y. Spatial-domain steganalytic feature selection based on three-way interaction information and KS test. Soft comput 2019. [DOI: 10.1007/s00500-019-03910-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
172
|
Abstract
Abstract
Nowadays, being in digital era the data generated by various applications are increasing drastically both row-wise and column wise; this creates a bottleneck for analytics and also increases the burden of machine learning algorithms that work for pattern recognition. This cause of dimensionality can be handled through reduction techniques. The Dimensionality Reduction (DR) can be handled in two ways namely Feature Selection (FS) and Feature Extraction (FE). This paper focuses on a survey of feature selection methods, from this extensive survey we can conclude that most of the FS methods use static data. However, after the emergence of IoT and web-based applications, the data are generated dynamically and grow in a fast rate, so it is likely to have noisy data, it also hinders the performance of the algorithm. With the increase in the size of the data set, the scalability of the FS methods becomes jeopardized. So the existing DR algorithms do not address the issues with the dynamic data. Using FS methods not only reduces the burden of the data but also avoids overfitting of the model.
Collapse
|
173
|
|
174
|
Xie F, Li F, Lei C, Yang J, Zhang Y. Unsupervised band selection based on artificial bee colony algorithm for hyperspectral image classification. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2018.11.014] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
175
|
Computational Intelligence on Short-Term Load Forecasting: A Methodological Overview. ENERGIES 2019. [DOI: 10.3390/en12030393] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Electricity demand forecasting has been a real challenge for power system scheduling in different levels of energy sectors. Various computational intelligence techniques and methodologies have been employed in the electricity market for short-term load forecasting, although scant evidence is available about the feasibility of these methods considering the type of data and other potential factors. This work introduces several scientific, technical rationales behind short-term load forecasting methodologies based on works of previous researchers in the energy field. Fundamental benefits and drawbacks of these methods are discussed to represent the efficiency of each approach in various circumstances. Finally, a hybrid strategy is proposed.
Collapse
|
176
|
Beshir WF, Tohge T, Watanabe M, Hertog MLATM, Hoefgen R, Fernie AR, Nicolaï BM. Non-aqueous fractionation revealed changing subcellular metabolite distribution during apple fruit development. HORTICULTURE RESEARCH 2019; 6:98. [PMID: 31666959 PMCID: PMC6804870 DOI: 10.1038/s41438-019-0178-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 06/26/2019] [Accepted: 07/01/2019] [Indexed: 05/07/2023]
Abstract
In developing apple fruit, metabolic compartmentation is poorly understood due to the lack of experimental data. Distinguishing subcellular compartments in fruit using non-aqueous fractionation has been technically difficult due to the excess amount of sugars present in the different subcellular compartments limiting the resolution of the technique. The work described in this study represents the first attempt to apply non-aqueous fractionation to developing apple fruit, covering the major events occurring during fruit development (cell division, cell expansion, and maturation). Here we describe the non-aqueous fractionation method to study the subcellular compartmentation of metabolites during apple fruit development considering three main cellular compartments (cytosol, plastids, and vacuole). Evidence is presented that most of the sugars and organic acids were predominantly located in the vacuole, whereas some of the amino acids were distributed between the cytosol and the vacuole. The results showed a shift in the plastid marker from the lightest fractions in the early growth stage to the dense fractions in the later fruit growth stages. This implies that the accumulation of starch content with progressing fruit development substantially influenced the distribution of plastidial fragments within the non-aqueous density gradient applied. Results from this study provide substantial baseline information on assessing the subcellular compartmentation of metabolites in apple fruit in general and during fruit growth in particular.
Collapse
Affiliation(s)
- Wasiye F. Beshir
- Division of Mechatronics, Biostatistics and Sensors (MeBioS), Department of Biosystems (BIOSYST), KU Leuven, Leuven, Belgium
| | - Takayuki Tohge
- Max Planck Institute of Molecular Plant Physiology (MPI-MP), Potsdam-Golm, Germany
| | - Mutsumi Watanabe
- Max Planck Institute of Molecular Plant Physiology (MPI-MP), Potsdam-Golm, Germany
| | - Maarten L. A. T. M. Hertog
- Division of Mechatronics, Biostatistics and Sensors (MeBioS), Department of Biosystems (BIOSYST), KU Leuven, Leuven, Belgium
| | - Rainer Hoefgen
- Max Planck Institute of Molecular Plant Physiology (MPI-MP), Potsdam-Golm, Germany
| | - Alisdair R. Fernie
- Max Planck Institute of Molecular Plant Physiology (MPI-MP), Potsdam-Golm, Germany
| | - Bart M. Nicolaï
- Division of Mechatronics, Biostatistics and Sensors (MeBioS), Department of Biosystems (BIOSYST), KU Leuven, Leuven, Belgium
- Flanders Centre of Postharvest Technology (VCBT), Leuven, Belgium
| |
Collapse
|
177
|
|
178
|
Facial Expression Recognition Based on Discrete Separable Shearlet Transform and Feature Selection. ALGORITHMS 2018. [DOI: 10.3390/a12010011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In this paper, a novel approach to facial expression recognition based on the discrete separable shearlet transform (DSST) and normalized mutual information feature selection is proposed. The approach can be divided into five steps. First, all test and training images are preprocessed. Second, DSST is applied to the preprocessed facial expression images, and all the transformation coefficients are obtained as the original feature set. Third, an improved normalized mutual information feature selection is proposed to find the optimal feature subset of the original feature set, thus we can retain the key classification information of the original data. Fourth, the feature extraction and selection of the feature space is reduced by employing linear discriminant analysis. Finally, a support vector machine is used to recognize the expressions. In this study, experimental verification was carried out on four open facial expression databases. The results show that this method can not only improve the recognition rate of facial expressions, but also significantly reduce the computational complexity and improve the system efficiency.
Collapse
|
179
|
Supposed Maximum Mutual Information for Improving Generalization and Interpretation of Multi-Layered Neural Networks. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2018. [DOI: 10.2478/jaiscr-2018-0029] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
The present paper1 aims to propose a new type of information-theoretic method to maximize mutual information between inputs and outputs. The importance of mutual information in neural networks is well known, but the actual implementation of mutual information maximization has been quite difficult to undertake. In addition, mutual information has not extensively been used in neural networks, meaning that its applicability is very limited. To overcome the shortcoming of mutual information maximization, we present it here in a very simplified manner by supposing that mutual information is already maximized before learning, or at least at the beginning of learning. The method was applied to three data sets (crab data set, wholesale data set, and human resources data set) and examined in terms of generalization performance and connection weights. The results showed that by disentangling connection weights, maximizing mutual information made it possible to explicitly interpret the relations between inputs and outputs.
Collapse
|
180
|
Jia X, Han Q, Lu Z. Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps. BMC Bioinformatics 2018; 19:512. [PMID: 30558536 PMCID: PMC6296107 DOI: 10.1186/s12859-018-2495-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Accepted: 11/16/2018] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND For analyzing these gene expression data sets under different samples, clustering and visualizing samples and genes are important methods. However, it is difficult to integrate clustering and visualizing techniques when the similarities of samples and genes are defined by PCC(Person correlation coefficient) measure. RESULTS Here, for rare samples of gene expression data sets, we use MG-PCC (mini-groups that are defined by PCC) algorithm to divide them into mini-groups, and use t-SNE-SSP maps to display these mini-groups, where the idea of MG-PCC algorithm is that the nearest neighbors should be in the same mini-groups, t-SNE-SSP map is selected from a series of t-SNE(t-statistic Stochastic Neighbor Embedding) maps of standardized samples, and these t-SNE maps have different perplexity parameter. Moreover, for PCC clusters of mass genes, they are displayed by t-SNE-SGI map, where t-SNE-SGI map is selected from a series of t-SNE maps of standardized genes, and these t-SNE maps have different initialization dimensions. Here, t-SNE-SSP and t-SNE-SGI maps are selected by A-value, where A-value is modeled from areas of clustering projections, and t-SNE-SSP and t-SNE-SGI maps are such t-SNE map that has the smallest A-value. CONCLUSIONS From the analysis of cancer gene expression data sets, we demonstrate that MG-PCC algorithm is able to put tumor and normal samples into their respective mini-groups, and t-SNE-SSP(or t-SNE-SGI) maps are able to display the relationships between mini-groups(or PCC clusters) clearly. Furthermore, t-SNE-SS(m)(or t-SNE-SG(n)) maps are able to construct independent tree diagrams of the nearest sample(or gene) neighbors, where each tree diagram is corresponding to a mini-group of samples(or genes).
Collapse
Affiliation(s)
- Xingang Jia
- School of Mathematics, Southeast University, Nanjing, 210096, People's Republic of China.
| | - Qiuhong Han
- Department of Mathematics, Nanjing Forestry University, Nanjing, 210037, People's Republic of China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, People's Republic of China
| |
Collapse
|
181
|
Nie F, Yang S, Zhang R, Li X. A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 28:2428-2438. [PMID: 30571626 DOI: 10.1109/tip.2018.2886761] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Most existing feature selection methods rank all the features by a certain criterion, via which the top ranking features are selected for the subsequent classification or clustering tasks. Due to neglecting the feature redundancy, the selected features are frequently correlated with each other such that performance could be compromised. To address this issue, we propose a novel auto-weighted feature selection framework via global redundancy minimization (AGRM) in this paper. Different from other feature selection methods, the proposed method can truly select the representative and non-redundant features, since the redundancy among the features can be largely reduced from the global perspective. In addition, AGRM is extended to a compact (C-AGRM) framework, which is more concise and efficient. Moreover, both of the proposed frameworks are auto-weighted, i.e., parameterfree, so that they are pragmatic in real applications. In general, the proposed frameworks serve as post-processing system, which can be applied to the existing supervised and unsupervised feature selection methods to refine the original feature score for the non-redundant features. Eventually, extensive experiments on nine benchmark datasets are conducted to demonstrate the effectiveness and the superiority of our proposed frameworks.
Collapse
|
182
|
l2,1-norm minimization based negative label relaxation linear regression for feature selection. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2018.10.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
183
|
Mariello A, Battiti R. Feature Selection Based on the Neighborhood Entropy. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6313-6322. [PMID: 29994549 DOI: 10.1109/tnnls.2018.2830700] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In feature selection, a measure that captures nonlinear relationships between features and class is the mutual information (MI), which is based on how information in the features reduces the uncertainty in the output. In this paper, we propose a new measure that is related to MI, called neighborhood entropy, and a novel filter method based on its minimization in a greedy procedure. Our algorithm integrates sequential forward selection with approximated nearest-neighbors techniques and locality-sensitive hashing. Experiments show that the classification accuracy is usually higher than that of other state-of-the-art algorithms, with the best results obtained with problems that are highly unbalanced and nonlinearly separable. The order by which the features are selected is also better, leading to a higher accuracy for fewer features. The experimental results indicate that our technique can be employed effectively in offline scenarios when one can dedicate more CPU time to achieve superior results and more robustness to noise and to class imbalance.
Collapse
|
184
|
Gao W, Hu L, Zhang P, Wang F. Feature selection by integrating two groups of feature evaluation criteria. EXPERT SYSTEMS WITH APPLICATIONS 2018; 110:11-19. [DOI: 10.1016/j.eswa.2018.05.029] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
185
|
Blanco JL, Porto-Pazos AB, Pazos A, Fernandez-Lozano C. Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection. Sci Rep 2018; 8:15688. [PMID: 30356060 PMCID: PMC6200741 DOI: 10.1038/s41598-018-33911-z] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 10/06/2018] [Indexed: 12/22/2022] Open
Abstract
Screening and in silico modeling are critical activities for the reduction of experimental costs. They also speed up research notably and strengthen the theoretical framework, thus allowing researchers to numerically quantify the importance of a particular subset of information. For example, in fields such as cancer and other highly prevalent diseases, having a reliable prediction method is crucial. The objective of this paper is to classify peptide sequences according to their anti-angiogenic activity to understand the underlying principles via machine learning. First, the peptide sequences were converted into three types of numerical molecular descriptors based on the amino acid composition. We performed different experiments with the descriptors and merged them to obtain baseline results for the performance of the models, particularly of each molecular descriptor subset. A feature selection process was applied to reduce the dimensionality of the problem and remove noisy features – which are highly present in biological problems. After a robust machine learning experimental design under equal conditions (nested resampling, cross-validation, hyperparameter tuning and different runs), we statistically and significantly outperformed the best previously published anti-angiogenic model with a generalized linear model via coordinate descent (glmnet), achieving a mean AUC value greater than 0.96 and with an accuracy of 0.86 with 200 molecular descriptors, mixed from the three groups. A final analysis with the top-40 discriminative anti-angiogenic activity peptides is presented along with a discussion of the feature selection process and the individual importance of each molecular descriptors According to our findings, anti-angiogenic activity peptides are strongly associated with amino acid sequences SP, LSL, PF, DIT, PC, GH, RQ, QD, TC, SC, AS, CLD, ST, MF, GRE, IQ, CQ and HG.
Collapse
Affiliation(s)
- Jose Liñares Blanco
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain
| | - Ana B Porto-Pazos
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Alejandro Pazos
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain.,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science, Faculty of Computer Science, University of A Coruña, A Coruña, 15071, Spain. .,Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña, A Coruña, Spain.
| |
Collapse
|
186
|
|
187
|
Liang Y, Wang X, Zhang SH, Hu SM, Liu S. PhotoRecomposer: Interactive Photo Recomposition by Cropping. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2728-2742. [PMID: 29990001 DOI: 10.1109/tvcg.2017.2764895] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present a visual analysis method for interactively recomposing a large number of photos based on example photos with high-quality composition. The recomposition method is formulated as a matching problem between photos. The key to this formulation is a new metric for accurately measuring the composition distance between photos. We have also developed an earth-mover-distance-based online metric learning algorithm to support the interactive adjustment of the composition distance based on user preferences. To better convey the compositions of a large number of example photos, we have developed a multi-level, example photo layout method to balance multiple factors such as compactness, aspect ratio, composition distance, stability, and overlaps. By introducing an EulerSmooth-based straightening method, the composition of each photos is clearly displayed. The effectiveness and usefulness of the method has been demonstrated by the experimental results, user study, and case studies.
Collapse
|
188
|
Low Redundancy Feature Selection of Short Term Solar Irradiance Prediction Using Conditional Mutual Information and Gauss Process Regression. SUSTAINABILITY 2018. [DOI: 10.3390/su10082889] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Solar irradiation is influenced by many meteorological features, which results in a complex structure meaning its prediction has low efficiency and accuracy. The existing prediction methods are focused on analyzing the correlation between features and irradiation to reduce model complexity but they do not account for redundant analysis in feature subset. In order to reduce the information redundancy in the feature set and improve prediction accuracy, a novel feature selection method for short-term irradiation prediction based on Conditional Mutual Information (CMI) and Gaussian Process Regression (GPR) is proposed. Firstly, the CMI values of different features are calculated to evaluate correlation and redundant information between features in the feature subsets. Secondly, GPR with a stable prediction performance and adaptively determined hyper parameters is used as the predictor. The optimal feature subset and the GPR covariance function can be selected using Sequential Forward Selection (SFS). Finally, an optimal predictor is determined by the minimum prediction error and the prediction of solar irradiation is carried out by the determined predictor. The experimental results show that CMI-GPRAEK has the highest prediction accuracy with the optimal feature set has low dimension, which is 4.33% lower in MAPE than the predictor without feature selection, although both of them have an optimal kernel function. The CMI-GPRAEK is less complicated for the predictor and there is less redundancy between features in the model with the dimension of the optimal feature set is only 14.
Collapse
|
189
|
Multi-Step Ahead Wind Power Generation Prediction Based on Hybrid Machine Learning Techniques. ENERGIES 2018. [DOI: 10.3390/en11081975] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Accurate generation prediction at multiple time-steps is of paramount importance for reliable and economical operation of wind farms. This study proposed a novel algorithmic solution using various forms of machine learning techniques in a hybrid manner, including phase space reconstruction (PSR), input variable selection (IVS), K-means clustering and adaptive neuro-fuzzy inference system (ANFIS). The PSR technique transforms the historical time series into a set of phase-space variables combining with the numerical weather prediction (NWP) data to prepare candidate inputs. A minimal redundancy maximal relevance (mRMR) criterion based filtering approach is used to automatically select the optimal input variables for the multi-step ahead prediction. Then, the input instances are divided into a set of subsets using the K-means clustering to train the ANFIS. The ANFIS parameters are further optimized to improve the prediction performance by the use of particle swarm optimization (PSO) algorithm. The proposed solution is extensively evaluated through case studies of two realistic wind farms and the numerical results clearly confirm its effectiveness and improved prediction accuracy compared to benchmark solutions.
Collapse
|
190
|
Candra H, Yuwono M, Chai R, Nguyen HT, Su S. EEG emotion recognition using reduced channel wavelet entropy and average wavelet coefficient features with normal Mutual Information method. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:463-466. [PMID: 29059910 DOI: 10.1109/embc.2017.8036862] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recognizing emotion from EEG signals is a complicated task that requires complex features and a substantial number of EEG channels. Simple algorithms to analyse the feature and reduce the EEG channel number will give an indispensable advantages. Therefore, this study explores a combination of wavelet entropy and average wavelet coefficient (WEAVE) as a potential EEG-emotion feature to classify valence and arousal emotions with the advantage of the ability to identify the occurrence of a pattern while at the same time identify the shape of a pattern in EEG emotion signal. The complexity of the feature was reduced using the Normalized Mutual Information (NMI) method to obtain a reduced number of channels. Classification with the WEAVE feature achieved 76.8% accuracy for valence and 74.3% for arousal emotion, respectively. The analysis with NMI shows that the WEAVE feature has linear characteristics and offers possibilities to reduce the EEG channels to a certain number. Further analysis also reveals that detection of valence emotion with reduced EEG channels has a different combination of EEG channels compared to arousal emotion.
Collapse
|
191
|
Munoz R, Olivares R, Taramasco C, Villarroel R, Soto R, Barcelos TS, Merino E, Alonso-Sánchez MF. Using Black Hole Algorithm to Improve EEG-Based Emotion Recognition. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2018; 2018:3050214. [PMID: 29991942 PMCID: PMC6016227 DOI: 10.1155/2018/3050214] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 04/18/2018] [Accepted: 05/08/2018] [Indexed: 12/22/2022]
Abstract
Emotions are a critical aspect of human behavior. One widely used technique for research in emotion measurement is based on the use of EEG signals. In general terms, the first step of signal processing is the elimination of noise, which can be done in manual or automatic terms. The next step is determining the feature vector using, for example, entropy calculation and its variations to generate a classification model. It is possible to use this approach to classify theoretical models such as the Circumplex model. This model proposes that emotions are distributed in a two-dimensional circular space. However, methods to determine the feature vector are highly susceptible to noise that may exist in the signal. In this article, a new method to adjust the classifier is proposed using metaheuristics based on the black hole algorithm. The method is aimed at obtaining results similar to those obtained with manual noise elimination methods. In order to evaluate the proposed method, the MAHNOB HCI Tagging Database was used. Results show that using the black hole algorithm to optimize the feature vector of the Support Vector Machine we obtained an accuracy of 92.56% over 30 executions.
Collapse
Affiliation(s)
- Roberto Munoz
- Escuela de Ingeniería Civil Informática, Universidad de Valparaíso, Valparaíso, Chile
- Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| | - Rodrigo Olivares
- Escuela de Ingeniería Civil Informática, Universidad de Valparaíso, Valparaíso, Chile
- Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| | - Carla Taramasco
- Escuela de Ingeniería Civil Informática, Universidad de Valparaíso, Valparaíso, Chile
| | | | - Ricardo Soto
- Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| | - Thiago S. Barcelos
- Instituto Federal de Educação, Ciência e Tecnologia de São Paulo, Brazil
| | - Erick Merino
- Escuela de Ingeniería Civil Informática, Universidad de Valparaíso, Valparaíso, Chile
| | | |
Collapse
|
192
|
Chakraborty S, Das S. Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm. Stat Probab Lett 2018. [DOI: 10.1016/j.spl.2018.01.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
193
|
De Meulder B, Lefaudeux D, Bansal AT, Mazein A, Chaiboonchoe A, Ahmed H, Balaur I, Saqi M, Pellet J, Ballereau S, Lemonnier N, Sun K, Pandis I, Yang X, Batuwitage M, Kretsos K, van Eyll J, Bedding A, Davison T, Dodson P, Larminie C, Postle A, Corfield J, Djukanovic R, Chung KF, Adcock IM, Guo YK, Sterk PJ, Manta A, Rowe A, Baribaud F, Auffray C. A computational framework for complex disease stratification from multiple large-scale datasets. BMC SYSTEMS BIOLOGY 2018; 12:60. [PMID: 29843806 PMCID: PMC5975674 DOI: 10.1186/s12918-018-0556-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 02/21/2018] [Indexed: 01/05/2023]
Abstract
BACKGROUND Multilevel data integration is becoming a major area of research in systems biology. Within this area, multi-'omics datasets on complex diseases are becoming more readily available and there is a need to set standards and good practices for integrated analysis of biological, clinical and environmental data. We present a framework to plan and generate single and multi-'omics signatures of disease states. METHODS The framework is divided into four major steps: dataset subsetting, feature filtering, 'omics-based clustering and biomarker identification. RESULTS We illustrate the usefulness of this framework by identifying potential patient clusters based on integrated multi-'omics signatures in a publicly available ovarian cystadenocarcinoma dataset. The analysis generated a higher number of stable and clinically relevant clusters than previously reported, and enabled the generation of predictive models of patient outcomes. CONCLUSIONS This framework will help health researchers plan and perform multi-'omics big data analyses to generate hypotheses and make sense of their rich, diverse and ever growing datasets, to enable implementation of translational P4 medicine.
Collapse
Affiliation(s)
- Bertrand De Meulder
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France.
| | - Diane Lefaudeux
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Aruna T Bansal
- Acclarogen Ltd, St John's Innovation Centre, Cambridge, CB4 OWS, UK
| | - Alexander Mazein
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Amphun Chaiboonchoe
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Hassan Ahmed
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Irina Balaur
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Mansoor Saqi
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Johann Pellet
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Stéphane Ballereau
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Nathanaël Lemonnier
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France
| | - Kai Sun
- Data Science Institute, Imperial College, London, SW7 2AZ, UK
| | - Ioannis Pandis
- Data Science Institute, Imperial College, London, SW7 2AZ, UK.,Janssen Research and Development Ltd, High Wycombe, HP12 4DP, UK
| | - Xian Yang
- Data Science Institute, Imperial College, London, SW7 2AZ, UK
| | | | | | | | | | - Timothy Davison
- Janssen Research and Development Ltd, High Wycombe, HP12 4DP, UK
| | - Paul Dodson
- AstraZeneca Ltd, Alderley Park, Macclesfield, SK10 4TG, UK
| | | | - Anthony Postle
- Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
| | - Julie Corfield
- AstraZeneca R & D, 43150, Mölndal, Sweden.,Arateva R & D Ltd, Nottingham, NG1 1GF, UK
| | - Ratko Djukanovic
- Faculty of Medicine, University of Southampton, Southampton, SO17 1BJ, UK
| | - Kian Fan Chung
- National Hearth and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Ian M Adcock
- National Hearth and Lung Institute, Imperial College London, London, SW3 6LY, UK
| | - Yi-Ke Guo
- Data Science Institute, Imperial College, London, SW7 2AZ, UK
| | - Peter J Sterk
- Department of Respiratory Medicine, Academic Medical Centre, University of Amsterdam, Amsterdam, AZ1105, The Netherlands
| | - Alexander Manta
- Research Informatics, Roche Diagnostics GmbH, 82008, Unterhaching, Germany
| | - Anthony Rowe
- Janssen Research and Development Ltd, High Wycombe, HP12 4DP, UK
| | | | - Charles Auffray
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, EISBM, 50 Avenue Tony Garnier, 69007, Lyon, France.
| | | |
Collapse
|
194
|
Armanfard N, Reilly JP, Komeili M. Logistic Localized Modeling of the Sample Space for Feature Selection and Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1396-1413. [PMID: 28333643 DOI: 10.1109/tnnls.2017.2676101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Conventional feature selection algorithms assign a single common feature set to all regions of the sample space. In contrast, this paper proposes a novel algorithm for localized feature selection for which each region of the sample space is characterized by its individual distinct feature subset that may vary in size and membership. This approach can therefore select an optimal feature subset that adapts to local variations of the sample space, and hence offer the potential for improved performance. Feature subsets are computed by choosing an optimal coordinate space so that, within a localized region, within-class distances and between-class distances are, respectively, minimized and maximized. Distances are measured using a logistic function metric within the corresponding region. This enables the optimization process to focus on a localized region within the sample space. A local classification approach is utilized for measuring the similarity of a new input data point to each class. The proposed logistic localized feature selection (lLFS) algorithm is invariant to the underlying probability distribution of the data; hence, it is appropriate when the data are distributed on a nonlinear or disjoint manifold. lLFS is efficiently formulated as a joint convex/increasing quasi-convex optimization problem with a unique global optimum point. The method is most applicable when the number of available training samples is small. The performance of the proposed localized method is successfully demonstrated on a large variety of data sets. We demonstrate that the number of features selected by the lLFS method saturates at the number of available discriminative features. In addition, we have shown that the Vapnik-Chervonenkis dimension of the localized classifier is finite. Both these factors suggest that the lLFS method is insensitive to the overfitting issue, relative to other methods.
Collapse
|
195
|
Zhang R, Lv Q, Tao J, Gao F. Data Driven Modeling Using an Optimal Principle Component Analysis Based Neural Network and Its Application to a Nonlinear Coke Furnace. Ind Eng Chem Res 2018. [DOI: 10.1021/acs.iecr.8b00071] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ridong Zhang
- The Belt and Road Information Research Institute, Automation College, Hangzhou Dianzi University, Hangzhou, 310018, P.R. China
| | - Qiang Lv
- The Belt and Road Information Research Institute, Automation College, Hangzhou Dianzi University, Hangzhou, 310018, P.R. China
| | - Jili Tao
- Ningbo Institute of Technology, Zhejiang University, Ningbo 315100, P.R. China
| | - Furong Gao
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, P.R. China
| |
Collapse
|
196
|
Jiang S, Chin KS, Qu G, Tsui KL. An integrated machine learning framework for hospital readmission prediction. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.01.027] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
197
|
Oh J, Cho D, Park J, Na SH, Kim J, Heo J, Shin CS, Kim JJ, Park JY, Lee B. Prediction and early detection of delirium in the intensive care unit by using heart rate variability and machine learning. Physiol Meas 2018; 39:035004. [PMID: 29376502 DOI: 10.1088/1361-6579/aaab07] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Delirium is an important syndrome found in patients in the intensive care unit (ICU), however, it is usually under-recognized during treatment. This study was performed to investigate whether delirious patients can be successfully distinguished from non-delirious patients by using heart rate variability (HRV) and machine learning. APPROACH Electrocardiography data of 140 patients was acquired during daily ICU care, and HRV data were analyzed. Delirium, including its type, severity, and etiologies, was evaluated daily by trained psychiatrists. HRV data and various machine learning algorithms including linear support vector machine (SVM), SVM with radial basis function (RBF) kernels, linear extreme learning machine (ELM), ELM with RBF kernels, linear discriminant analysis, and quadratic discriminant analysis were utilized to distinguish delirium patients from non-delirium patients. MAIN RESULTS HRV data of 4797 ECGs were included, and 39 patients had delirium at least once during their ICU stay. The maximum classification accuracy was acquired using SVM with RBF kernels. Our prediction method based on HRV with machine learning was comparable to previous delirium prediction models using massive amounts of clinical information. SIGNIFICANCE Our results show that autonomic alterations could be a significant feature of patients with delirium in the ICU, suggesting the potential for the automatic prediction and early detection of delirium based on HRV with machine learning.
Collapse
Affiliation(s)
- Jooyoung Oh
- Department of Biomedical Science and Engineering (BMSE), Institute of Integrated Technology (IIT), Gwangju Institute of Science and Technology (GIST), Gwangju, Republic of Korea. These authors contributed equally to this work
| | | | | | | | | | | | | | | | | | | |
Collapse
|
198
|
Thilaga M, Ramasamy V, Nadarajan R, Nandagopal D. Shortest path based network analysis to characterize cognitive load states of human brain using EEG based functional brain networks. J Integr Neurosci 2018. [DOI: 10.3233/jin-170049] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Affiliation(s)
- M. Thilaga
- Computational Neuroscience Laboratory, Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamil Nadu, India
| | - Vijayalakshmi Ramasamy
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, USA
- Cognitive Neuroengineering Laboratory, School of Information Technology and Mathematical Sciences, Division of IT, Engineering and the Environments, University of South Australia, Adelaide, South Australia
| | - R. Nadarajan
- Computational Neuroscience Laboratory, Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, Tamil Nadu, India
| | - D. Nandagopal
- Cognitive Neuroengineering Laboratory, School of Information Technology and Mathematical Sciences, Division of IT, Engineering and the Environments, University of South Australia, Adelaide, South Australia
| |
Collapse
|
199
|
Reduced gene subset selection based on discrimination power boosting for molecular classification. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2017.11.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
200
|
Ma M, Deng T, Wang N, Chen Y. Semi-supervised rough fuzzy Laplacian Eigenmaps for dimensionality reduction. INT J MACH LEARN CYB 2018. [DOI: 10.1007/s13042-018-0784-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|