1
|
Kotan M, Faruk Seymen Ö, Çallı L, Kasım S, Çarklı Yavuz B, Över Özçelik T. A novel methodological approach to SaaS churn prediction using whale optimization algorithm. PLoS One 2025; 20:e0319998. [PMID: 40359310 PMCID: PMC12074543 DOI: 10.1371/journal.pone.0319998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 02/11/2025] [Indexed: 05/15/2025] Open
Abstract
Customer churn is a critical concern in the Software as a Service (SaaS) sector, potentially impacting long-term growth within the cloud computing industry. The scarcity of research on customer churn models in SaaS, particularly regarding diverse feature selection methods and predictive algorithms, highlights a significant gap. Addressing this would enhance academic discourse and provide essential insights for managerial decision-making. This study introduces a novel approach to SaaS churn prediction using the Whale Optimization Algorithm (WOA) for feature selection. Results show that WOA-reduced datasets improve processing efficiency and outperform full-variable datasets in predictive performance. The study encompasses a range of prediction techniques with three distinct datasets evaluated derived from over 1,000 users of a multinational SaaS company: the WOA-reduced dataset, the full-variable dataset, and the chi-squared-derived dataset. These three datasets were examined with the most used in literature, k-nearest neighbor, Decision Trees, Naïve Bayes, Random Forests, and Neural Network techniques, and the performance metrics such as Area Under Curve, Accuracy, Precision, Recall, and F1 Score were used as classification success. The results demonstrate that the WOA-reduced dataset outperformed the full-variable and chi-squared-derived datasets regarding performance metrics.
Collapse
Affiliation(s)
- Muhammed Kotan
- Department of Information Systems Engineering, Sakarya University, Sakarya, Turkey
| | - Ömer Faruk Seymen
- Department of Quantitative Methods, Sakarya University, Sakarya, Turkey
| | - Levent Çallı
- Department of Information Systems Engineering, Sakarya University, Sakarya, Turkey
| | - Sena Kasım
- Department of Information Systems Engineering, Sakarya University, Sakarya, Turkey
| | - Burcu Çarklı Yavuz
- Department of Information Systems Engineering, Sakarya University, Sakarya, Turkey
| | | |
Collapse
|
2
|
He C, Ding CHQ. A novel classification algorithm for customer churn prediction based on hybrid Ensemble-Fusion model. Sci Rep 2024; 14:20179. [PMID: 39215049 PMCID: PMC11364882 DOI: 10.1038/s41598-024-71168-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024] Open
Abstract
Nowadays, customer churn issues are becoming more and more important, which is one of the most important metrics for evaluating the health of a business it is difficult to measure success without measuring customer churn metrics. However, it has become a challenge for the industry to predict when customers are churning or preparing to churn and to take the necessary action at the critical time before they do. At the same time, how to keep the place of deep research on the 17 machine learning algorithms in 9 major classes of machine learning classics production is the first problem we are facing. Through customer churn deep research, we mentioned the Ensemble-Fusion model based on machine learning and introduced a smart intelligent system to help reduce the actual customer churn about the production. Comparing with most popular predictive models, such as the Support vector machine algorithm, Random Forest algorithm, K-Nearest-Neighbor algorithm, Gradient boosting algorithm, Logistic regression algorithm, Bayesian algorithm, Decision tree algorithm, and Neural network algorithm are applied to check the effect on accuracy, AUC, and F1-score. By comparing with 17 algorithms in 9 categories of machine learning classics, the data prediction accuracy of the Ensemble-Fusion model reaches 95.35%, AUC score reaches 91% and F1-Score reaches 96.96%. The experimental results show that the data prediction accuracy of the Ensemble-Fusion model outperforms that of other benchmark algorithms.
Collapse
Affiliation(s)
- Chenggang He
- School of Public Safety and Emergency Management, Anhui University of Science and Technology, No.15 Fengxia Road, Hefei, 230041, Anhui, China.
- School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230039, Anhui, China.
| | - Chris H Q Ding
- School Department of Computer Science and Engineering, University of Texas at Arlington, 701 S. Nedderman Drive, Arlington, TX, 76019, USA
- School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230039, Anhui, China
| |
Collapse
|
3
|
Sikri A, Jameel R, Idrees SM, Kaur H. Enhancing customer retention in telecom industry with machine learning driven churn prediction. Sci Rep 2024; 14:13097. [PMID: 38849493 PMCID: PMC11161656 DOI: 10.1038/s41598-024-63750-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 05/31/2024] [Indexed: 06/09/2024] Open
Abstract
Customer churn remains a critical concern for businesses, highlighting the significance of retaining existing customers over acquiring new ones. Effective prediction of potential churners aids in devising robust retention policies and efficient customer management strategies. This study dives into the realm of machine learning algorithms for predictive analysis in churn prediction, addressing the inherent challenge posed by diverse and imbalanced customer churn data distributions. This paper introduces a novel approach-the Ratio-based data balancing technique, which addresses data skewness as a pre-processing step, ensuring improved accuracy in predictive modelling. This study fills gaps in existing literature by highlighting the effectiveness of ensemble algorithms and the critical role of data balancing techniques in optimizing churn prediction models. While our research contributes a novel approach, there remain avenues for further exploration. This work evaluates several machine learning algorithms-Perceptron, Multi-Layer Perceptron, Naive Bayes, Logistic Regression, K-Nearest Neighbour, Decision Tree, alongside Ensemble techniques such as Gradient Boosting and Extreme Gradient Boosting (XGBoost)-on balanced datasets achieved through our proposed Ratio-based data balancing technique and the commonly used Data Resampling. Results reveal that our proposed Ratio-based data balancing technique notably outperforms traditional Over-Sampling and Under-Sampling methods in churn prediction accuracy. Additionally, using combined algorithms like Gradient Boosting and XGBoost showed better results than using single methods. Our study looked at different aspects like Accuracy, Precision, Recall, and F-Score, finding that these combined methods are better for predicting customer churn. Specifically, when we used a 75:25 ratio with the XGBoost method, we got the most promising results for our analysis which are presented in this work.
Collapse
Affiliation(s)
- Alisha Sikri
- Noida Institute of Engineering and Technology, Greater Noida, 201306, Uttar Pradesh, India
| | - Roshan Jameel
- Westford University College, Sharjah, United Arab Emirates
| | - Sheikh Mohammad Idrees
- Department of Computer Science (IDI), Norwegian University of Science and Technology, Trondheim, Norway.
| | - Harleen Kaur
- Department of Computer Science, Jamia Hamdard, New Delhi, India
| |
Collapse
|
4
|
AlShourbaji I, Helian N, Sun Y, Hussien AG, Abualigah L, Elnaim B. An efficient churn prediction model using gradient boosting machine and metaheuristic optimization. Sci Rep 2023; 13:14441. [PMID: 37660198 PMCID: PMC10475067 DOI: 10.1038/s41598-023-41093-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 08/22/2023] [Indexed: 09/04/2023] Open
Abstract
Customer churn remains a critical challenge in telecommunications, necessitating effective churn prediction (CP) methodologies. This paper introduces the Enhanced Gradient Boosting Model (EGBM), which uses a Support Vector Machine with a Radial Basis Function kernel (SVMRBF) as a base learner and exponential loss function to enhance the learning process of the GBM. The novel base learner significantly improves the initial classification performance of the traditional GBM and achieves enhanced performance in CP-EGBM after multiple boosting stages by utilizing state-of-the-art decision tree learners. Further, a modified version of Particle Swarm Optimization (PSO) using the consumption operator of the Artificial Ecosystem Optimization (AEO) method to prevent premature convergence of the PSO in the local optima is developed to tune the hyper-parameters of the CP-EGBM effectively. Seven open-source CP datasets are used to evaluate the performance of the developed CP-EGBM model using several quantitative evaluation metrics. The results showed that the CP-EGBM is significantly better than GBM and SVM models. Results are statistically validated using the Friedman ranking test. The proposed CP-EGBM is also compared with recently reported models in the literature. Comparative analysis with state-of-the-art models showcases CP-EGBM's promising improvements, making it a robust and effective solution for churn prediction in the telecommunications industry.
Collapse
Affiliation(s)
- Ibrahim AlShourbaji
- Department of Computer Science, University of Hertfordshire, Hatfield, UK
- Department of Computer and Network Engineering, Jazan University, 82822-6649, Jazan, Saudi Arabia
| | - Na Helian
- Department of Computer Science, University of Hertfordshire, Hatfield, UK
| | - Yi Sun
- Department of Computer Science, University of Hertfordshire, Hatfield, UK
| | - Abdelazim G Hussien
- Department of Computer and Information Science, Linköping University, Linköping, Sweden.
- Faculty of Science, Fayoum University, Faiyum, Egypt.
| | - Laith Abualigah
- Computer Science Department, Prince Hussein Bin Abdullah Faculty for Information Technology, Al Al-Bayt University, Mafraq, 25113, Jordan
- Department of Electrical and Computer Engineering, Lebanese American University, 13-5053, Byblos, Lebanon
- Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, 19328, Jordan
- MEU Research Unit, Middle East University, Amman, 11831, Jordan
- Applied Science Research Center, Applied Science Private University, Amman, 11931, Jordan
- School of Computer Sciences, Universiti Sains Malaysia, 11800, Pulau Pinang, Malaysia
- School of Engineering and Technology, Sunway University Malaysia, 27500, Petaling Jaya, Malaysia
| | - Bushra Elnaim
- Department of Computer Science, College of Science and Humanities in Al-Sulail, Prince Sattam Bin Abdulaziz University, 11671, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
Suh Y. Machine learning based customer churn prediction in home appliance rental business. JOURNAL OF BIG DATA 2023; 10:41. [PMID: 37033202 PMCID: PMC10074358 DOI: 10.1186/s40537-023-00721-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 03/21/2023] [Indexed: 06/19/2023]
Abstract
Customer churn is a major issue for large enterprises. In particular, in the rental business sector, companies are looking for ways to retain their customers because they are their main source of revenue. The main contribution of our work is to analyze the customer behavior information of actual water purifier rental company, where customer churn occurs very frequently, and to develop and verify the churn prediction model. A machine learning algorithm was applied to a large-capacity operating dataset of rental care service in an electronics company in Korea, to learn meaningful features. To measure the performance of the model, the F-measure and area under curve (AUC) were adopted whereby an F1 value of 93% and an AUC of 88% were achieved. The dataset containing approximately 84,000 customers was used for training and testing. Another contribution was to evaluate the inference performance of the predictive model using the contract status of about 250,000 customer data currently in operation, confirming a hit rate of about 80%. Finally, this study identified and calculated the influence of key variables on individual customer churn to enable a business person (rental care customer management staff) to carry out customer-tailored marketing to address the cause of the churn.
Collapse
Affiliation(s)
- Youngjung Suh
- LG Electronics Inc, Yeongdeungpo-Gu, Seoul, 07336 South Korea
| |
Collapse
|
6
|
Tang Q, Xia G, Zhang X. A hybrid classification model for churn prediction based on customer clustering. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-190677] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Qi Tang
- College of Computer Science and Information Engineering, Guangxi Normal University, Guiling, Guangxi, China
| | - Guoen Xia
- College of Computer Science and Information Engineering, Guangxi Normal University, Guiling, Guangxi, China
- School of Business Administration, Guangxi University of Finance and Economics, Nanning, Guangxi, China
| | - Xianquan Zhang
- College of Computer Science and Information Engineering, Guangxi Normal University, Guiling, Guangxi, China
| |
Collapse
|
7
|
Rabby G, Azad S, Mahmud M, Zamli KZ, Rahman MM. TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique. Cognit Comput 2020. [DOI: 10.1007/s12559-019-09706-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
AbstractAutomatic keyphrase extraction techniques aim to extract quality keyphrases for higher level summarization of a document. Majority of the existing techniques are mainly domain-specific, which require application domain knowledge and employ higher order statistical methods, and computationally expensive and require large train data, which is rare for many applications. Overcoming these issues, this paper proposes a new unsupervised keyphrase extraction technique. The proposed unsupervised keyphrase extraction technique, named TeKET or Tree-based Keyphrase Extraction Technique, is a domain-independent technique that employs limited statistical knowledge and requires no train data. This technique also introduces a new variant of a binary tree, called KeyPhrase Extraction (KePhEx) tree, to extract final keyphrases from candidate keyphrases. In addition, a measure, called Cohesiveness Index or CI, is derived which denotes a given node’s degree of cohesiveness with respect to the root. The CI is used in flexibly extracting final keyphrases from the KePhEx tree and is co-utilized in the ranking process. The effectiveness of the proposed technique and its domain and language independence are experimentally evaluated using available benchmark corpora, namely SemEval-2010 (a scientific articles dataset), Theses100 (a thesis dataset), and a German Research Article dataset, respectively. The acquired results are compared with other relevant unsupervised techniques belonging to both statistical and graph-based techniques. The obtained results demonstrate the improved performance of the proposed technique over other compared techniques in terms of precision, recall, and F1 scores.
Collapse
|