1
|
Loughrey CF, Maguire S, Dłotko P, Bai L, Orr N, Jurek-Loughrey A. A novel method for subgroup discovery in precision medicine based on topological data analysis. BMC Med Inform Decis Mak 2025; 25:139. [PMID: 40102808 PMCID: PMC11921513 DOI: 10.1186/s12911-025-02852-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 01/03/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND The Mapper algorithm is a data mining topological tool that can help us to obtain higher level understanding of disease by visualising the structure of patient data as a similarity graph. It has been successfully applied for exploratory analysis of cancer data in the past, delivering several significant subgroup discoveries. Using the Mapper algorithm in practice requires setting up multiple parameters. The graph then needs to be manually analysed according to a research question at hand. It has been highlighted in the literature that Mapper's parameters have significant impact on the output graph shape and there is no established way to select their optimal values. Hence while using the Mapper algorithm, different parameter values and consequently different output graphs need to be studied. This prevents routine application of the Mapper algorithm in real world settings. METHODS We propose a new algorithm for subgroup discovery within the Mapper graph. We refer to the task as hotspot detection as it is designed to identify homogenous and geometrically compact subsets of patients, which are distinct with respect to their clinical or molecular profiles (e.g. survival). Furthermore, we propose to include the existence of a hotspot as a criterion while searching the parameter space, addressing one of the key limitations of the Mapper algorithm (i.e. parameter selection). RESULTS Two experiments were performed to demonstrate the efficacy of the algorithm, including an artificial hotspot in the Two Circles dataset and a real world case study of subgroup discovery in oestrogen receptor-positive breast cancer. Our hotspot detection algorithm successfully identified graphs containing homogenous communities of nodes within the Two Circles dataset. When applied to gene expression data of ER+ breast cancer patients, appropriate parameters were identified to generate a Mapper graph revealing a hotspot of ER+ patients with poor prognosis and characteristic patterns of gene expression. This was subsequently confirmed in an independent breast cancer dataset. CONCLUSIONS Our proposed method can be effectively applied for subgroup discovery with pathology data. It allows us to find optimal parameters of the Mapper algorithm, bridging the gap between its potential and the translational research.
Collapse
Affiliation(s)
- Ciara F Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK
| | - Sarah Maguire
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, Belfast, UK
| | - Paweł Dłotko
- Dioscuri Centre in Topological Data Analysis, Mathematical Institute, Polish Academy of Sciences, Warsaw, Poland
| | - Lu Bai
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK
| | - Nick Orr
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, Belfast, UK
| | - Anna Jurek-Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast, UK.
| |
Collapse
|
2
|
Ghazi SM, Mahmoudi M. Statistical and data visualization techniques to study the role of one-electron in the energy of neutral and charged clusters of Na 39. Sci Rep 2025; 15:1739. [PMID: 39799241 PMCID: PMC11724881 DOI: 10.1038/s41598-025-86141-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Accepted: 01/08/2025] [Indexed: 01/15/2025] Open
Abstract
In this work, we explored the role of a single electron in the energy of neutral and charged clusters ofNa 39 using data visualization and statistical techniques as a new insight. Initially, we studied the effects of one electron, time, and temperature on energy using multiple linear regression analysis with dummy variables, and the results demonstrated that all three predictors significantly affected the energy. Time had a positive impact (direct ratio effect) on the energy ofNa - 39 , andNa 39 , and a negative impact (inverse ratio effect) on the energy ofNa + 39 , while temperature had a positive effect on the energy of all three sodium clusters. Then, to study the thermodynamic properties of each cluster, we employed the fuzzy clustering technique. The results verified that each sodium cluster is divided into three groups based on the different temperatures used to investigate the thermodynamic properties of each cluster. Finally, time series analysis was applied to investigate the behavior of the energy in each sodium cluster and each temperature. We used the statistical software R version 4.3.3 to perform all statistical computations.
Collapse
Affiliation(s)
- Seyed Mohammad Ghazi
- Department of Physics, Faculty of Science, Fasa University, Fasa, 74616-86131, Iran
| | - Mohammadreza Mahmoudi
- Department of Statistics, Faculty of Science, Fasa University, Fasa, 74616-86131, Iran.
| |
Collapse
|
3
|
Shan Y, Li S, Li F, Cui Y, Chen M. Dual-level clustering ensemble algorithm with three consensus strategies. Sci Rep 2023; 13:22617. [PMID: 38114636 PMCID: PMC10730624 DOI: 10.1038/s41598-023-49947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 12/13/2023] [Indexed: 12/21/2023] Open
Abstract
Clustering ensemble (CE), renowned for its robust and potent consensus capability, has garnered significant attention from scholars in recent years and has achieved numerous noteworthy breakthroughs. Nevertheless, three key issues persist: (1) the majority of CE selection strategies rely on preset parameters or empirical knowledge as a premise, lacking adaptive selectivity; (2) the construction of co-association matrix is excessively one-sided; (3) the CE method lacks a more macro perspective to reconcile the conflicts among different consensus results. To address these aforementioned problems, a dual-level clustering ensemble algorithm with three consensus strategies is proposed. Firstly, a backward clustering ensemble selection framework is devised, and its built-in selection strategy can adaptively eliminate redundant members. Then, at the base clustering consensus level, taking into account the interplay between actual spatial location information and the co-occurrence frequency, two modified relation matrices are reconstructed, resulting in the development of two consensus methods with different modes. Additionally, at the CE consensus level with a broader perspective, an adjustable Dempster-Shafer evidence theory is developed as the third consensus method in present algorithm to dynamically fuse multiple ensemble results. Experimental results demonstrate that compared to seven other state-of-the-art and typical CE algorithms, the proposed algorithm exhibits exceptional consensus ability and robustness.
Collapse
Affiliation(s)
- Yunxiao Shan
- School of Science, Harbin University of Science and Technology, Harbin, 150080, China
| | - Shu Li
- School of Science, Harbin University of Science and Technology, Harbin, 150080, China.
- Key Laboratory of Engineering Dielectric and Applications (Ministry of Education), School of Electrical and Electronic Engineering, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Fuxiang Li
- School of Science, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Yuxin Cui
- School of Science, Harbin University of Science and Technology, Harbin, 150080, China
| | - Minghua Chen
- Key Laboratory of Engineering Dielectric and Applications (Ministry of Education), School of Electrical and Electronic Engineering, Harbin University of Science and Technology, Harbin, 150080, China
| |
Collapse
|
4
|
Wu Q, Sun Y, Lv L, Yan X. An Optimally Selective Ensemble Classifier Based on Multimodal Perturbation and Its Application. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2023. [DOI: 10.1007/s13369-022-07573-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
5
|
Panskyi T, Korzeniewska E. Statistical and clustering validation analysis of primary students' learning outcomes and self-awareness of information and technical online security problems at a post-pandemic time. EDUCATION AND INFORMATION TECHNOLOGIES 2022; 28:6423-6451. [PMID: 36415781 PMCID: PMC9670056 DOI: 10.1007/s10639-022-11436-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 10/27/2022] [Indexed: 05/25/2023]
Abstract
The authors decided to investigate the impact of the pandemic period and the resulting limitations in Polish primary school online security education. The first part of the study investigates the impact of the COVID-19 pandemic on students' educational learning outcomes in information and Internet security. The study has been performed via a student-oriented survey of 20 questions. The statistical analysis confirms the significant difference before and after the pandemic in several questions at most. Nevertheless, this justifies the statement that pandemics had a positive impact on post-pandemic Internet-related security education. The second part of the study has been focused on students' perception and self-awareness of cyberspace problems. For this purpose, the authors used novel majority-based decision fusion clustering validation methods. The revealed results illustrate the positive tendency toward the students' self-awareness and self-confidence of online security problems and e-threats before, during and after the challenging pandemic period. Moreover, the presented validation methods show the appealing performance in educational data analysis, and therefore, the authors recommended these methods as a preprocessing step that helps to explore the intrinsic data structures or students' behaviors and as a postprocessing step to predict learning outcomes in different educational environments.
Collapse
Affiliation(s)
- Taras Panskyi
- Institute of Applied Computer Science, Lodz University of Technology, Stefanowskiego 18/22, 90-537 Lodz, Poland
- Institute of Electrical Engineering Systems, Lodz University of Technology, Stefanowskiego 18/22, 90-924 Lodz, Poland
| | - Ewa Korzeniewska
- Institute of Applied Computer Science, Lodz University of Technology, Stefanowskiego 18/22, 90-537 Lodz, Poland
- Institute of Electrical Engineering Systems, Lodz University of Technology, Stefanowskiego 18/22, 90-924 Lodz, Poland
| |
Collapse
|
6
|
Parallel gravitational clustering based on grid partitioning for large-scale data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03661-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
7
|
Qian L. Research on complex attribute big data classification based on iterative fuzzy clustering algorithm. WEB INTELLIGENCE 2021. [DOI: 10.3233/web-210463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In order to overcome the low classification accuracy of traditional methods, this paper proposes a new classification method of complex attribute big data based on iterative fuzzy clustering algorithm. Firstly, principal component analysis and kernel local Fisher discriminant analysis were used to reduce dimensionality of complex attribute big data. Then, the Bloom Filter data structure is introduced to eliminate the redundancy of the complex attribute big data after dimensionality reduction. Secondly, the redundant complex attribute big data is classified in parallel by iterative fuzzy clustering algorithm, so as to complete the complex attribute big data classification. Finally, the simulation results show that the accuracy, the normalized mutual information index and the Richter’s index of the proposed method are close to 1, the classification accuracy is high, and the RDV value is low, which indicates that the proposed method has high classification effectiveness and fast convergence speed.
Collapse
Affiliation(s)
- Li Qian
- School of Digital Information Technology, Zhejiang Technical Institute of Economics, Hangzhou 310018, China. E-mail:
| |
Collapse
|
8
|
A multi-level consensus function clustering ensemble. Soft comput 2021. [DOI: 10.1007/s00500-021-06092-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Kang SJ, Lim Y. Ensemble mapper. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Sung Jin Kang
- Department of Statistics Chung‐Ang University Seoul Korea
| | - Yaeji Lim
- Department of Statistics Chung‐Ang University Seoul Korea
| |
Collapse
|
10
|
Multi-objective whale optimization algorithm and multi-objective grey wolf optimizer for solving next release problem with developing fairness and uncertainty quality indicators. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02018-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
11
|
|
12
|
|
13
|
Dabighi K, Nazari A, Saryazdi S. A step edge detector based on bilinear transformation. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-191229] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Nowadays, Canny edge detector is considered to be one of the best edge detection approaches for the images with step form. Various overgeneralized versions of these edge detectors have been offered up to now, e.g. Saryazdi edge detector. This paper proposes a new discrete version of edge detection which is obtained from Shen-Castan and Saryazdi filters by using bilinear transformation. Different experimentations are conducted to decide the suitable parameters of the proposed edge detector and to examine its validity. To evaluate the strength of the proposed model, the results are compared to Canny, Sobel, Prewitt, LOG and Saryazdi methods. Finally, by calculation of mean square error (MSE) and peak signal-to-noise ratio (PSNR), the value of PSNR is always equal to or greater than the PSNR value of suggested methods. Moreover, by calculation of Baddeley’s error metric (BEM) on ten test images from the Berkeley Segmentation DataSet (BSDS), we show that the proposed method outperforms the other methods. Therefore, visual and quantitative comparison shows the efficiency and strength of proposed method.
Collapse
Affiliation(s)
- Korosh Dabighi
- Department of Mathematics, Kerman Branch, Islamic Azad University, Kerman, Iran
| | - Akbar Nazari
- Department of Pure Mathematics, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Saeid Saryazdi
- Department of Electrical Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
| |
Collapse
|
14
|
|
15
|
|
16
|
Kejia S, Parvin H, Qasem SN, Tuan BA, Pho KH. A classification model based on svm and fuzzy rough set for network intrusion detection. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-191621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Intrusion Detection Systems (IDS) are designed to provide security into computer networks. Different classification models such as Support Vector Machine (SVM) has been successfully applied on the network data. Meanwhile, the extension or improvement of the current models using prototype selection simultaneous with their training phase is crucial due to the serious inefficacies during training (i.e. learning overhead). This paper introduces an improved model for prototype selection. Applying proposed prototype selection along with SVM classification model increases attack discovery rate. In this article, we use fuzzy rough sets theory (FRST) for prototype selection to enhance SVM in intrusion detection. Testing and evaluation of the proposed IDS have been mainly performed on NSL-KDD dataset as a refined version of KDD-CUP99. Experimentations indicate that the proposed IDS outperforms the basic and simple IDSs and modern IDSs in terms of precision, recall, and accuracy rate.
Collapse
Affiliation(s)
- Shen Kejia
- The Second Affiliated Hospital of the Second Military Medical University, Shanghai City, China
| | - Hamid Parvin
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam
- Faculty of Information Technology, Duy Tan University, Da Nang, Vietnam
- Department of Computer Science, Nourabad Mamasani Branch, Islamic Azad University, Mamasani, Iran
| | - Sultan Noman Qasem
- Computer Science Department, College of Computer and Information Sciences, AI Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
- Computer Science Department, Faculty of Applied Science, Taiz University, Taiz, Yemen
| | - Bui Anh Tuan
- Department of Mathematics Education, Teachers College, Can Tho University, Can Tho City, Vietnam
| | - Kim-Hung Pho
- Fractional Calculus, Optimization and Algebra Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| |
Collapse
|
17
|
Mahmoudi MR, Baleanu D, Mansor Z, Tuan BA, Pho KH. Fuzzy clustering method to compare the spread rate of Covid-19 in the high risks countries. CHAOS, SOLITONS, AND FRACTALS 2020; 140:110230. [PMID: 32863611 PMCID: PMC7442906 DOI: 10.1016/j.chaos.2020.110230] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/09/2020] [Accepted: 08/21/2020] [Indexed: 05/23/2023]
Abstract
The numbers of confirmed cases of new coronavirus (Covid-19) are increased daily in different countries. To determine the policies and plans, the study of the relations between the distributions of the spread of this virus in other countries is critical. In this work, the distributions of the spread of Covid-19 in Unites States America, Spain, Italy, Germany, United Kingdom, France, and Iran were compared and clustered using fuzzy clustering technique. At first, the time series of Covid-19 datasets in selected countries were considered. Then, the relation between spread of Covid-19 and population's size was studied using Pearson correlation. The effect of the population's size was eliminated by rescaling the Covid-19 datasets based on the population's size of USA. Finally, the rescaled Covid-19 datasets of the countries were clustered using fuzzy clustering. The results of Pearson correlation indicated that there were positive and significant between total confirmed cases, total dead cases and population's size of the countries. The clustering results indicated that the distribution of spreading in Spain and Italy was approximately similar and differed from other countries.
Collapse
Affiliation(s)
- Mohammad Reza Mahmoudi
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
- Department of Statistics, Faculty of Science, Fasa University, Fasa, Fars, Iran
| | - Dumitru Baleanu
- Department of Mathematics, Faculty of Art and Sciences, Cankaya University Balgat 06530, Ankara, Turkey
- Institute of Space Sciences, Magurele-Bucharest, Romania
| | - Zulkefli Mansor
- Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Bui Anh Tuan
- Department of Mathematics Education, Teachers College, Can Tho University, Vietnam
| | - Kim-Hung Pho
- Fractional Calculus, Optimization and Algebra Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| |
Collapse
|
18
|
Prediction of the Solubility of CO2 in Imidazolium Ionic Liquids Based on Selective Ensemble Modeling Method. Processes (Basel) 2020. [DOI: 10.3390/pr8111369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Solubility data is one of the essential basic data for CO2 capture by ionic liquids. A selective ensemble modeling method, proposed to overcome the shortcomings of current methods, was developed and applied to the prediction of the solubility of CO2 in imidazolium ionic liquids. Firstly, multiple different sub–models were established based on the diversities of data, structural, and parameter design philosophy. Secondly, the fuzzy C–means algorithm was used to cluster the sub–models, and the collinearity detection method was adopted to eliminate the sub–models with high collinearity. Finally, the information entropy method integrated the sub–models into the selective ensemble model. The validation of the CO2 solubility predictions against experimental data showed that the proposed ensemble model had better performance than its previous alternative, because more effective information was extracted from different angles, and the diversity and accuracy among the sub–models were fully integrated. This work not only provided an effective modeling method for the prediction of the solubility of CO2 in ionic liquids, but also provided an effective method for the discrimination of ionic liquids for CO2 capture.
Collapse
|
19
|
Shi J. Identification of Circulating Fluidized Bed Boiler Bed Temperature Based on Hyper-Plane-Shaped Fuzzy C-Regression Model. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2020. [DOI: 10.1142/s1469026820500297] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Bed temperature in dense-phase zone is the key parameter of circulating fluidized bed (CFB) boiler for stable combustion and economic operation. It is difficult to establish an accurate bed temperature model as the complexity of circulating fluidized bed combustion system. T-S fuzzy model was widely applied in the system identification for it can approximate complex nonlinear system with high accuracy. Fuzzy c-regression model (FCRM) clustering based on hyper-plane-shaped distance has the advantages in describing T-S fuzzy model, and Gaussian function was adapted in antecedent membership function of T-S fuzzy model. However, Gaussian fuzzy membership function was more suitable for clustering algorithm using point to point distance, such as fuzzy c-means (FCM). In this paper, a hyper-plane-shaped FCRM clustering algorithm for T-S fuzzy model identification algorithm is proposed. The antecedent membership function of proposed identification algorithm is defined by a hyper-plane-shaped membership function and an improved fuzzy partition method is applied. To illustrate the efficiency of the proposed identification algorithm, the algorithm is applied in four nonlinear systems which shows higher identification accuracy and simplified identification process. At last, the algorithm is used in a circulating fluidized bed boiler bed temperature identification process, and gets better identification result.
Collapse
Affiliation(s)
- Jianzhong Shi
- School of Energy and Power Engineering, Nanjing Institute of Technology, Hong Jing Da Dao 1 Hao, Nanjing, Jiangsu 211167, P. R. China
| |
Collapse
|
20
|
Wang Z, Parvin H, Qasem SN, Tuan BA, Pho KH. Cluster ensemble selection using balanced normalized mutual information. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-191531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
A bad partition in an ensemble will be removed by a cluster ensemble selection framework from the final ensemble. It is the main idea in cluster ensemble selection to remove these partitions (bad partitions) from the selected ensemble. But still, it is likely that one of them contains some reliable clusters. Therefore, it may be reasonable to apply the selection phase on cluster level. To do this, a cluster evaluation metric is needed. Some of these metrics have been recently introduced; each of them has its limitations. The weak points of each method have been addressed in the paper. Subsequently, a new metric for cluster assessment has been introduced. The new measure is named Balanced Normalized Mutual Information (BNMI) criterion. It balances the deficiency of the traditional NMI-based criteria. Additionally, an innovative cluster ensemble approach has been proposed. To create the consensus partition considering the elected clusters, a set of different aggregation-functions (called also consensus-functions) have been utilized: the ones which are based upon the co-association matrix (CAM), the ones which are based on hyper graph partitioning algorithms, and the ones which are based upon intermediate space. The experimental study indicates that the state-of-the-art cluster ensemble methods are outperformed by the proposed cluster ensemble approach.
Collapse
Affiliation(s)
- Zecong Wang
- School of Computer Science and Cyberspace Security, Hainan University, China
| | - Hamid Parvin
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam
- Faculty of Information Technology, Duy Tan University, Da Nang, Vietnam
- Department of Computer Science, Nourabad Mamasani Branch, Islamic Azad University, Mamasani, Iran
| | - Sultan Noman Qasem
- Computer Science Department, College of Computer and Information Sciences, AI Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
- Computer Science Department, Faculty of Applied Science, Taiz University, Taiz, Yemen
| | - Bui Anh Tuan
- Department of Mathematics Education, Teachers College, Can Tho University, Can Tho City, Vietnam
| | - Kim-Hung Pho
- Fractional Calculus, Optimization and Algebra Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| |
Collapse
|
21
|
Analysis of University Students’ Behavior Based on a Fusion K-Means Clustering Algorithm. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10186566] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the development of big data technology, creating the ‘Digital Campus’ is a hot issue. For an increasing amount of data, traditional data mining algorithms are not suitable. The clustering algorithm is becoming more and more important in the field of data mining, but the traditional clustering algorithm does not take the clustering efficiency and clustering effect into consideration. In this paper, the algorithm based on K-Means and clustering by fast search and find of density peaks (K-CFSFDP) is proposed, which improves on the distance and density of data points. This method is used to cluster students from four universities. The experiment shows that K-CFSFDP algorithm has better clustering results and running efficiency than the traditional K-Means clustering algorithm, and it performs well in large scale campus data. Additionally, the results of the cluster analysis show that the students of different categories in four universities had different performances in living habits and learning performance, so the university can learn about the students’ behavior of different categories and provide corresponding personalized services, which have certain practical significance.
Collapse
|
22
|
Li G, Mahmoudi MR, Qasem SN, Tuan BA, Pho KH. Cluster ensemble of valid small clusters. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-191530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Guang Li
- Institute of Data Science, City University of Macau, Macau
| | - Mohammad Reza Mahmoudi
- Institute of Research and Development, Duy Tan University, Da Nang, Vietnam
- Department of Statistics, Faculty of Science, Fasa University, Fasa, Iran
| | - Sultan Noman Qasem
- Department of Computer Science, College of Computer and Information Sciences, AI Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
- Department of Computer Science, Faculty of Applied Science, Taiz University, Taiz, Yemen
| | - Bui Anh Tuan
- Department of Mathematics Education, Teachers College, Can Tho University, Can Tho City, Vietnam
| | - Kim-Hung Pho
- Fractional Calculus, Optimization and Algebra Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| |
Collapse
|
23
|
|
24
|
Le T, Vo MT, Kieu T, Hwang E, Rho S, Baik SW. Multiple Electric Energy Consumption Forecasting Using a Cluster-Based Strategy for Transfer Learning in Smart Building. SENSORS 2020; 20:s20092668. [PMID: 32392858 PMCID: PMC7362249 DOI: 10.3390/s20092668] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/15/2020] [Accepted: 05/03/2020] [Indexed: 11/18/2022]
Abstract
Electric energy consumption forecasting is an interesting, challenging, and important issue in energy management and equipment efficiency improvement. Existing approaches are predictive models that have the ability to predict for a specific profile, i.e., a time series of a whole building or an individual household in a smart building. In practice, there are many profiles in each smart building, which leads to time-consuming and expensive system resources. Therefore, this study develops a robust framework for the Multiple Electric Energy Consumption forecasting (MEC) of a smart building using Transfer Learning and Long Short-Term Memory (TLL), the so-called MEC-TLL framework. In this framework, we first employ a k-means clustering algorithm to cluster the daily load demand of many profiles in the training set. In this phase, we also perform Silhouette analysis to specify the optimal number of clusters for the experimental datasets. Next, this study develops the MEC training algorithm, which utilizes a cluster-based strategy for transfer learning the Long Short-Term Memory models to reduce the computational time. Finally, extensive experiments are conducted to compare the computational time and different performance metrics for multiple electric energy consumption forecasting on two smart buildings in South Korea. The experimental results indicate that our proposed approach is capable of economical overheads while achieving superior performances. Therefore, the proposed approach can be applied effectively for intelligent energy management in smart buildings.
Collapse
Affiliation(s)
- Tuong Le
- Informetrics Research Group, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam;
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam
| | - Minh Thanh Vo
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam;
| | - Tung Kieu
- University of Science, Vietnam National University, Ho Chi Minh City 700000, Vietnam;
| | - Eenjun Hwang
- School of Electrical Engineering, Korea University, Seoul 02841, Korea;
| | - Seungmin Rho
- Department of Software, Sejong University, Seoul 05006, Korea;
| | - Sung Wook Baik
- Department of Software, Sejong University, Seoul 05006, Korea;
- Correspondence:
| |
Collapse
|
25
|
Bahrani P, Minaei-Bidgoli B, Parvin H, Mirzarezaee M, Keshavarz A, Alinejad-Rokny H. User and item profile expansion for dealing with cold start problem. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-191225] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Payam Bahrani
- Department of Computer Engineering, Science and Research branch, Islamic Azad University, Tehran, IR
| | - Behrouz Minaei-Bidgoli
- Department of Computer Engineering, Iran University of Science and Technology, Tehran, IR
| | - Hamid Parvin
- Department of Computer Engineering, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, IR
- Young Researchers and Elite Club, Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, IR
| | - Mitra Mirzarezaee
- Department of Computer Engineering, Science and Research branch, Islamic Azad University, Tehran, IR
| | - Ahmad Keshavarz
- Department of Electrical Engineering, Persian Gulf University, Bushehr, IR
| | - Hamid Alinejad-Rokny
- The Graduate School of Biomedical Engineering, UNSW Australia, Sydney, AU
- School of Computer Science and Engineering, UNSW Australia, Sydney, AU
| |
Collapse
|
26
|
|
27
|
|
28
|
|