Application of Machine Learning in the Telecommunications Industry: Partial Churn Prediction by using a Hybrid Feature Selection Approach
Subject Areas : Machine learningFatemeh Mozaffari 1 , Iman Raeesi Vanani 2 , Payam Mahmoudian 3 , Babak Sohrabi 4 *
1 - Department of Information Technology Management, College of Management, University of Tehran, Tehran, Iran
2 - Department of Industrial Management, Faculty of Management and Accounting, Allameh Tabataba’i University, Tehran, Iran
3 - Department of Information Technology Management, College of Management, University of Tehran, Tehran, Iran
4 - Department of Information Technology Management, College of Management, University of Tehran, Tehran, Iran
Keywords: Partial Churn, Churn Prediction, Machine Learning, Feature Selection, Telecommunications Industry, The Wisdom of the Crowd.,
Abstract :
The telecommunications industry is one of the most competitive industries in the world. Because of the high cost of customer acquisition and the adverse effects of customer churn on the company's performance, customer retention becomes an inseparable part of strategic decision-making and one of the main objectives of customer relationship management. Although customer churn prediction models are widely studied in various domains, several challenges remain in designing and implementing an effective model. This paper addresses the customer churn prediction problem with a practical approach. The experimental analysis was conducted on the customers' data gathered from available sources at a telecom company in Iran. First, partial churn was defined in a new way that exploits the status of customers based on criteria that can be measured easily in the telecommunications industry. This definition is also based on data mining techniques that can find the degree of similarity between assorted customers with active ones or churners. Moreover, a hybrid feature selection approach was proposed in which various feature selection methods, along with the crowd's wisdom, were applied. It was found that the wisdom of the crowd can be used as a useful feature selection method. Finally, a predictive model was developed using advanced machine learning algorithms such as bagging, boosting, stacking, and deep learning. The partial customer churn was predicted with more than 88% accuracy by the Gradient Boosting Machine algorithm by using 5-fold cross-validation. Comparative results indicate that the proposed model performs efficiently compared to the ones applied in the previous studies.
[1] S. Mitrović, B. Baesens, W. Lemahieu, and J. De Weerdt, “On the operational efficiency of different feature types for telco Churn prediction,” Eur. J. Oper. Res., vol. 267, no. 3, pp. 1141–1155, 2018.
[2] K. Coussement, S. Lessmann, and G. Verstraeten, “A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry,” Decis. Support Syst., vol. 95, pp. 27–36, 2017.
[3] I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector,” IEEE Access, vol. 7, pp. 60134–60149, 2019.
[4] J. Dyche, The CRM handbook: A business guide to customer relationship management. Addison-Wesley Professional, 2002.
[5] A. Idris and A. Khan, “Customer churn prediction for telecommunication: Employing various various features selection techniques and tree based ensemble classifiers,” in 2012 15th International Multitopic Conference (INMIC), 2012, pp. 23–27.
[6] W. Verbeke, D. Martens, C. Mues, and B. Baesens, “Building comprehensible customer churn prediction models with advanced rule induction techniques,” Expert Syst. Appl., vol. 38, no. 3, pp. 2354–2364, 2011.
[7] L. Geiler, S. Affeldt, and M. Nadif, “An effective strategy for churn prediction and customer profiling,” Data Knowl. Eng., vol. 142, p. 102100, 2022. [8] Y. Chen, L. Zhang, Y. Zhao, and B. Xu, “Implementation of penalized survival models in churn prediction of vehicle insurance,” J. Bus. Res., vol. 153, pp. 162–171, 2022. [9] M. Makhtar, S. Nafis, M. A. Mohamed, M. K. Awang, M. N. A. Rahman, and M. M. Deris, “Churn classification model for local telecommunication company based on rough set theory,” J. Fundam. Appl. Sci., vol. 9, no. 6S, pp. 854–868, 2017. [10] W. Buckinx and D. Van den Poel, “Customer base analysis: partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting,” Eur. J. Oper. Res., vol. 164, no. 1, pp. 252–268, 2005.
[11] J. Burez and D. Van den Poel, “Handling class imbalance in customer churn prediction,” Expert Syst. Appl., vol. 36, no. 3, pp. 4626–4636, 2009.
[12] A. Dingli, V. Marmara, and N. S. Fournier, “Comparison of Deep Learning Algorithms to Predict Customer Churn within a Local Retail Industry,” Int. J. Mach. Learn. Comput., vol. 7, no. 5, 2017.
[13] V. L. Miguéis, D. Van den Poel, A. S. Camanho, and J. F. e Cunha, “Modeling partial customer churn: On the value of first product-category purchase sequences,” Expert Syst. Appl., vol. 39, no. 12, pp. 11250–11256, 2012.
[14] V. L. Miguéis, A. Camanho, and J. F. e Cunha, “Customer attrition in retailing: an application of multivariate adaptive regression splines,” Expert Syst. Appl., vol. 40, no. 16, pp. 6225–6232, 2013.
[15] Y. Chen, Y. R. Gel, V. Lyubchich, and T. Winship, “Deep ensemble classifiers and peer effects analysis for churn forecasting in retail banking,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2018, pp. 373–385.
[16] N. Glady, B. Baesens, and C. Croux, “Modeling churn using customer lifetime value,” Eur. J. Oper. Res., vol. 197, no. 1, pp. 402–411, 2009.
[17] Y. Xie, X. Li, E. W. T. Ngai, and W. Ying, “Customer churn prediction using improved balanced random forests,” Expert Syst. Appl., vol. 36, no. 3, pp. 5445–5449, 2009.
[18] N. Gordini and V. Veglio, “Customers churn prediction and marketing retention strategies. An application of support vector machines based on the AUC parameter-selection technique in B2B e-commerce industry,” Ind. Mark. Manag., vol. 62, pp. 100–107, 2017.
[19] A. D. Rachid, A. Abdellah, B. Belaid, and L. Rachid, “Clustering Prediction Techniques in Defining and Predicting Customers Defection: The Case of E-Commerce Context,” Int. J. Electr. Comput. Eng., vol. 8, no. 4, p. 2367, 2018.
[20] A. Tamaddoni, S. Stakhovych, and M. Ewing, “The impact of personalised incentives on the profitability of customer retention campaigns,” J. Mark. Manag., vol. 33, no. 5–6, pp. 327–347, 2017.
[21] I. Adaji and J. Vassileva, “Predicting churn of expert respondents in social networks using data mining techniques: a case study of stack overflow,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 182–189.
[22] K. Coussement and D. Van den Poel, “Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques,” Expert Syst. Appl., vol. 34, no. 1, pp. 313–327, 2008.
[23] D. F. Benoit and D. Van den Poel, “Improving customer retention in financial services using kinship network information,” Expert Syst. Appl., vol. 39, no. 13, pp. 11435–11442, 2012.
[24] M. Á. de la Llave, F. A. López, and A. Angulo, “The impact of geographical factors on churn prediction: an application to an insurance company in Madrid’s urban area,” Scand. Actuar. J., vol. 2019, no. 3, pp. 188–203, 2019.
[25] J.-H. Ahn, S.-P. Han, and Y.-S. Lee, “Customer churn analysis: Churn determinants and mediation effects of partial defection in the Korean mobile telecommunications service industry,” Telecomm. Policy, vol. 30, no. 10–11, pp. 552–568, 2006.
[26] H. Faris, B. Al-Shboul, and N. Ghatasheh, “A genetic programming based framework for churn prediction in telecommunication industry,” in International Conference on Computational Collective Intelligence, 2014, pp. 353–362.
[27] A. S. Halibas, A. C. Matthew, I. G. Pillai, J. H. Reazol, E. G. Delvo, and L. B. Reazol, “Determining the Intervening Effects of Exploratory Data Analysis and Feature Engineering in Telecoms Customer Churn Modelling,” in 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC), 2019, pp. 1–7.
[28] J. Hu et al., “pRNN: A recurrent neural network based approach for customer churn prediction in telecommunication sector,” in 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 4081–4085. [29] M. Karanovic, M. Popovac, S. Sladojevic, M. Arsenovic, and D. Stefanovic, “Telecommunication Services Churn Prediction-Deep Learning Approach,” in 2018 26th Telecommunications Forum (TELFOR), 2018, pp. 420–425.
[30] A. Lemmens and C. Croux, “Bagging and boosting classification trees to predict churn,” J. Mark. Res., vol. 43, no. 2, pp. 276–286, 2006.
[31] T. Vafeiadis, K. I. Diamantaras, G. Sarigiannidis, and K. C. Chatzisavvas, “A comparison of machine learning techniques for customer churn prediction,” Simul. Model. Pract. Theory, vol. 55, pp. 1–9, 2015.
[32] E. Lima, C. Mues, and B. Baesens, “Monitoring and backtesting churn models,” Expert Syst. Appl., vol. 38, no. 1, pp. 975–982, 2011.
[33] A. Amin et al., “Customer churn prediction in the telecommunication sector using a rough set approach,” Neurocomputing, vol. 237, pp. 242–254, 2017.
[34] A. Hiziroglu and O. F. Seymen, “Modelling Customer Churn Using Segmentation and Data Mining.,” in DB&IS, 2014, pp. 259–271.
[35] V. Bhambri, “Data mining as a tool to predict churn behavior of customers,” Int. J. Manag. Res., pp. 59–69, 2013.
[36] M. Clemente-Císcar, S. San Matías, and V. Giner-Bosch, “A methodology based on profitability criteria for defining the partial defection of customers in non-contractual settings,” Eur. J. Oper. Res., vol. 239, no. 1, pp. 276–285, 2014.
[37] T. Mutanen, V. Österlund, and R. Kinnunen, “Monitoring service adaptation and customer churn in the beginning phase of a new service,” in Fourth International Conference on Data Analytics, DATA ANALYTICS 2015, 2015, pp. 69–73.
[38] D. Ringbeck, D. Smirnov, and A. Huchzermeier, “Proactive Retention Management in Retail: Field Experiment Evidence for Lasting Effects,” Available SSRN 3378498, 2019.
[39] W. Verbeke, K. Dejaeger, D. Martens, J. Hur, and B. Baesens, “New insights into churn prediction in the telecommunication sector: A profit driven data mining approach,” Eur. J. Oper. Res., vol. 218, no. 1, pp. 211–229, 2012.
[40] A. K. Ahmad, A. Jafar, and K. Aljoumaa, “Customer churn prediction in telecom using machine learning in big data platform,” J. Big Data, vol. 6, no. 1, p. 28, 2019.
[41] B. Bonev, F. Escolano, and M. Cazorla, “Feature selection, mutual information, and the classification of high-dimensional patterns,” Pattern Anal. Appl., vol. 11, no. 3–4, pp. 309–319, 2008.
[42] A. De Caigny, K. Coussement, and K. W. De Bock, “A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees,” Eur. J. Oper. Res., vol. 269, no. 2, pp. 760–772, 2018.
[43] T.-H. Hsu, C.-C. Chen, M.-F. Chiang, K.-W. Hsu, and W.-C. Peng, “Inferring potential users in mobile social networks,” in 2014 International Conference on Data Science and Advanced Analytics (DSAA), 2014, pp. 347–353.
[44] S. Maldonado, Á. Flores, T. Verbraken, B. Baesens, and R. Weber, “Profit-based feature selection using support vector machines–General framework and an application for customer retention,” Appl. Soft Comput., vol. 35, pp. 740–748, 2015.
[45] A. K. Meher, J. Wilson, and R. Prashanth, “Towards a large scale practical churn model for prepaid mobile markets,” in Industrial Conference on Data Mining, 2017, pp. 93–106.
[46] K. B. Subramanya and A. Somani, “Enhanced feature mining and classifier models to predict customer churn for an E-retailer,” in 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence, 2017, pp. 531–536.
[47] J. Van Hulse, T. M. Khoshgoftaar, A. Napolitano, and R. Wald, “Feature selection with high-dimensional imbalanced data,” in 2009 IEEE International Conference on Data Mining Workshops, 2009, pp. 507–514.
[48] M. B. Kursa and W. R. Rudnicki, “Feature selection with the Boruta package,” J Stat Softw, vol. 36, no. 11, pp. 1–13, 2010.
[49] H. Li, C.-J. Li, X.-J. Wu, and J. Sun, “Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine,” Appl. Soft Comput., vol. 19, pp. 57–67, 2014.
[50] H. Xu, Z. Zhang, and Y. Zhang, “Churn prediction in telecom using a hybrid two-phase feature selection method,” in 2009 Third International Symposium on Intelligent Information Technology Application, 2009, vol. 3, pp. 576–579.
[51] K. Cao and P. Shao, “Customer churn prediction based on svm-rfe,” in 2008 International Seminar on Business and Information Management, 2008, vol. 1, pp. 306–309.
[52] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature extraction: foundations and applications, vol. 207. Springer, 2008.
[53] Y. Li and G. Xia, “The explanation of support vector machine in customer churn prediction,” in 2010 International Conference on E-Product E-Service and E-Entertainment, 2010, pp. 1–4.
[54] Y. Saeys, T. Abeel, and Y. Van de Peer, “Robust feature selection using ensemble feature selection techniques,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2008, pp. 313–325.
[55] H. Hong, Q. Ye, Q. Du, G. A. Wang, and W. Fan, “Crowd characteristics and crowd wisdom: Evidence from an online investment community,” J. Assoc. Inf. Sci. Technol., vol. 71, no. 4, pp. 423–435, 2020. [56] J. Surowiecki, The wisdom of crowds. Anchor, 2005.
[57] W. Pan, Y. Altshuler, and A. Pentland, “Decoding social influence and the wisdom of the crowd in financial trading network,” in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, 2012, pp. 203–209.
[58] A. Bari, P. Peidaee, A. Khera, J. Zhu, and H. Chen, “Predicting financial markets using the wisdom of crowds,” in 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), 2019, pp. 334–340. [59] X. Wu, Q. Ye, Y. Jin, and Y. Li, “Wisdom of Experts and Crowds: Different Impacts of Analyst Recommendation and Online Search on the Stock Market.,” in PACIS, 2019, p. 129.
[60] I. Ajzen, “From intentions to actions: A theory of planned behavior,” in Action control, Springer, 1985, pp. 11–39.
[61] D. T. Larose and C. D. Larose, Discovering knowledge in data: an introduction to data mining, vol. 4. John Wiley & Sons, 2014.
[62] C.-F. Tsai and Y.-H. Lu, “Customer churn prediction by hybrid neural networks,” Expert Syst. Appl., vol. 36, no. 10, pp. 12547–12553, 2009.
[63] P. C. Pendharkar, “Genetic algorithm based neural network approaches for predicting churn in cellular wireless network services,” Expert Syst. Appl., vol. 36, no. 3, pp. 6714–6720, 2009.
[64] B. Q. Huang, T.-M. Kechadi, B. Buckley, G. Kiernan, E. Keogh, and T. Rashid, “A new feature set with new window techniques for customer churn prediction in land-line telecommunications,” Expert Syst. Appl., vol. 37, no. 5, pp. 3657–3665, 2010.
[65] C. Orsenigo and C. Vercellis, “Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification,” Pattern Recognit., vol. 43, no. 11, pp. 3787–3794, 2010.
[66] N. Kamalraj and A. Malathi, “An Ordered Fuzzy Rule Induction Based Churn Mining For Telecom Industry,” ICIREIE 2015, p. 17, 2015.
[67] B. Al-Shboul, H. Faris, and N. Ghatasheh, “Initializing genetic programming using fuzzy clustering and its application in churn prediction in the telecom industry,” Malaysian J. Comput. Sci., vol. 28, no. 3, pp. 213–220, 2015.
[68] J. Zaratiegui, A. Montoro, and F. Castanedo, “Performing highly accurate predictions through convolutional networks for actual telecommunication challenges,” arXiv Prepr. arXiv1511.04906, 2015.
[69] A. Rodan and H. Faris, “Echo state network with SVM-readout for customer churn prediction,” in 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), 2015, pp. 1–5.
[70] A. Wangperawong, C. Brun, O. Laudy, and R. Pavasuthipaisit, “Churn analysis using deep convolutional neural networks and autoencoders,” arXiv Prepr. arXiv1604.05377, 2016.
[71] M. Azeem, M. Usman, and A. C. M. Fong, “A churn prediction model for prepaid customers in telecom using fuzzy classifiers,” Telecommun. Syst., vol. 66, no. 4, pp. 603–614, 2017.
[72] F. Khan and S. S. Kozat, “Sequential churn prediction and analysis of cellular network users—A multi-class, multi-label perspective,” in 2017 25th Signal Processing and Communications Applications Conference (SIU), 2017, pp. 1–4.
[73] D. Bell and C. Mgbemena, “Data-driven agent-based exploration of customer behavior,” Simulation, vol. 94, no. 3, pp. 195–212, 2018.
[74] L. M. Qaisi, A. Rodan, K. Qaddoum, and R. Al-Sayyed, “Customer churn prediction using data mining approach,” in 2018 Fifth HCT Information Technology Trends (ITT), 2018, pp. 348–352.
[75] Y. Beeharry and R. Tsokizep Fokone, “Hybrid approach using machine learning algorithms for customers’ churn prediction in the telecommunications industry,” Concurr. Comput. Pract. Exp., p. e6627, 2021.
[76] S. Baghla and G. Gupta, “Performance Evaluation of Various Classification Techniques for Customer Churn Prediction in E-commerce,” Microprocess. Microsyst., vol. 94, p. 104680, 2022.
[77] M. A. Khan, M. A. I. Khan, M. Aref, and S. F. Khan, “Cluster & rough set theory based approach to find the reason for customer churn,” Int. J. Appl. Bus. Econ. Res, vol. 14, no. 1, pp. 439–455, 2016.
[78] F. Devriendt, J. Berrevoets, and W. Verbeke, “Why you should stop predicting customer churn and start using uplift models,” Inf. Sci. (Ny)., vol. 548, pp. 497–515, 2021.
[79] N. N. Y. Vo, S. Liu, X. Li, and G. Xu, “Leveraging unstructured call log data for customer churn prediction,” Knowledge-Based Syst., vol. 212, p. 106586, 2021.
[80] B. Erkayman, E. Erdem, T. Aydin, and Z. Mahmat, “New Artificial intelligence approaches for brand switching decisions,” Alexandria Eng. J., vol. 63, pp. 625–643, 2023.
[81] J. B. Rollins, “Foundational methodology for data science,” Domino Data Lab, Inc., Whitepaper, 2015.
[82] P. Chapman et al., “The CRISP-DM user guide,” in 4th CRISP-DM SIG Workshop in Brussels in March, 1999.
[83] I. Guyon and A. Elisseeff, “An introduction to feature extraction,” in Feature extraction, Springer, 2006, pp. 1–25.
[84] M. Landry and B. Angela, “Machine Learning with R and H2O,” Mt. View, CA, 2018.
[85] S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405–425, 2012.
[86] P. Cao, O. Zaiane, and D. Zhao, “A measure optimized cost-sensitive learning framework for imbalanced data classification,” in Biologically-Inspired Techniques for Knowledge Discovery and Data Mining, IGI Global, 2014, pp. 48–75.
[87] V. Effendy and Z. K. A. Baizal, “Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest,” in 2014 2nd International Conference on Information and Communication Technology (ICoICT), 2014, pp. 325–330.
[88] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches,” IEEE Trans. Syst. Man, Cybern. Part C (Applications Rev., vol. 42, no. 4, pp. 463–484, 2011.
[89] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,” in International conference on intelligent computing, 2005, pp. 878–887.
[90] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 2008, pp. 1322–1328.
[91] T. M. Khoshgoftaar, M. Golawala, and J. Van Hulse, “An empirical study of learning from imbalanced data using random forest,” in 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), 2007, vol. 2, pp. 310–317.
[92] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Trans. Syst. Man, Cybern. Part B, vol. 39, no. 2, pp. 539–550, 2008.
[93] N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: Improving prediction of the minority class in boosting,” in European conference on principles of data mining and knowledge discovery, 2003, pp. 107–119.
[94] J. Elith, J. R. Leathwick, and T. Hastie, “A working guide to boosted regression trees,” J. Anim. Ecol., vol. 77, no. 4, pp. 802–813, 2008.
[95] M. Malohlava and A. Candel, “Gradient boosting machine with H2O.” H20 Booklet, http://docs. h2o. ai/h2o/latest-stable/h2o-docs/booklets …, 2017.
[96] H. Wickham and M. H. Wickham, “Package tidyverse,” Easily Install Load ‘Tidyverse, 2017.
[97] M. Kuhn and H. Wickham, “Recipes: preprocessing tools to create design matrices.” 2018.
[98] J. Friedman, T. Hastie, R. Tibshirani, and B. Narasimhan, “Package ‘glmnet,’” CRAN R Repositary, 2021.
[99] M. B. Kursa, W. R. Rudnicki, and M. M. B. Kursa, “Package ‘Boruta.’” 2020. [100] S. RColorBrewer and M. A. Liaw, “Package ‘randomForest,’” Univ. California, Berkeley Berkeley, CA, USA, 2018.
[101] T. L. Pedersen and M. Benesty, “Package ‘lime.’” 2018.
[102] N. Hasbullah, A. J. Mahajar, and M. I. Salleh, “The conceptual framework for predicting loyalty intention in the consumer cooperatives using modified theory of planned behavior,” Int. J. Bus. Soc. Sci., vol. 5, no. 11, 2014.
[103] M. R. Khan, J. Manoj, A. Singh, and J. Blumenstock, “Behavioral modeling for churn prediction: Early indicators and accurate predictors of custom defection and loyalty,” in 2015 IEEE International Congress on Big Data, 2015, pp. 677–680.