Customer Churn Prediction Can Save Businesses – Machine Learning to the Rescue

Progressive companies have learned early on that customer acquisition is a very costly and lengthy proposition and maintaining a strong and loyal customer base is a crucial premise for any company.  This should explain why well-managed companies will go through lengths to preserve and grow their loyal customer base.  Aside from loss of revenues, excessive customer churn; can have dramatic negative impact on company’s brand and its public perception.  While I am not much into analogies but the saying that ‘anyone can burn a barn, but very few can build one’ is very true in this context.  In other words, it is far harder to gain good reputation rather than losing it.  The damage caused by excessive customer churn can be lasting and hard to reverse.  The negatives don’t end there.  Acquisition of new customers can be extremely costly for most businesses specially in competitive environments.  In total, it should come as no surprise that minimizing customer churn is of a great importance for any organization. 

Although It is unreasonable to expect that customer churn can be completely eliminated but ‘early identification’ and ‘intervention’ has proven to be very effective in converting unsatisfied customers to happy ones.  While every industry is different, the success rate for saving a customer by early intervention can be as high as 40%.  Very simple measures such as addressing minor customer concerns, offering new promotions, and concessions can make real difference when it comes to customer preservation.

 The decision to switch vendors made by an unsatisfied customer is rarely impulsive.  In most cases customers arrive at their decision at the tail end of a process.  In this deliberation phase, there are often marked changes in how they behave.  The ability to identify these changes can help predict when and why an unhappy customer will make a change.  Having the tools to sift through historical data and discover trends can provide valuable clues on how customers will react.  Simply put, companies that utilize predictive models to predict customer actions can leverage this knowledge and make attempts to save ‘at risk’ customers.  It should come as no surprise that ‘churn prediction’ is one of the most frequently sighted application in machine learning.

 We recently had the opportunity to utilize modern machine learning techniques to reduce the customer churn rate of an organization and have been able to produce remarkable results in this venture.  The remainder of this post is a brief summary of this project.


A major high-end National County/Golf Club with locations in several major metropolitan areas has experienced a significant rise in customer churn in recent months and we have had the fortune of assisting them to reverse this troubling trend.  Like many other astute organizations, they have turned to data science and machine learning and have been seeking assistance in developing and deploying models that can correctly predict members that are on the verge of canceling their membership.  Past experience has shown them that early intervention can save roughly 30% of these accounts.  Their business model is based on annual membership premiums as well as one-time member initiation fees.  The club offers tiered pricing and the average annual membership fees are roughly $150k.  The company has revenues in excess of $300M.  The annual number of cancelled accounts has reached 450 per annum.

Project Summary

Our marching orders were clear.  Leverage historical data; containing details of each account, and build a predictive model that can identify accounts that are at risk of being lost.  The prediction will be provided to select sales representatives and they will make an attempt to contact the customer and provide necessary incentives needed to reverse their intentions.  The task at hand is simply a classic machine learning classification problem and there is a plethora of models that can tackle this task.  We experimented with several machine learning models but none produced better results as XGBoost.  The following is a depiction of the cross- validation curve of this model.


Our final tuned model has shown remarkable performance when it comes to predicting ‘at risk’ members.  We have opted to use Recall ((True Positives)/(True Positives + False Negatives)) as scoring mechanism and have been able to achieve nearly 80% on this metric.  Additionally, the balanced accuracy of the model ((True Positives + True Negatives)/(All Cases) has been in excess of 83%.

Based on rough estimations, our model can correctly predict accounts at risk with 75% accuracy.   It has been shown that early intervention can save 30% of them.  This translates to nearly $24M in annual revenues that is not lost and nearly $3M reduction in operating cost (new customer acquisition cost).

Lessons Learned

  1. While well-tuned XGBoost models can produce remarkable outcomes; one has to deal with large number of parameters making hyperparameter tuning a time-consuming proposition (to say the least). 
  2. Data cleanup and hyperparameter optimization accounted for 90% of the project duration
  3. Aggressive feature engineering can make a huge difference in the final outcome
  4. It is absolutely appropriate to try complex and involved ideas but always start with the simple ones and work your way up
  5. Due to large number of parameters, we shied away from using random grid search for hyperparameter tuning and opted to utilize Bayesian optimization methods. High marks can be given to an open source tool called Ax (
  6. The training and inference was done both on AWS as well as Google Colab (using GPUs), former producing more favorable outcome
  7. All open-sourced tools have not been created equally.  The ones with sparse documentation and lofty promises should be left alone
  8. Utilization of a meaningful learning rate decay regime can be very useful
  9. During the training, choose a baseline scenario that is well-understood and repeatable.  Make gradual improvements to the baseline and avoid radical changes