How Can a CDP Help Minimize Business Churn Rate?
Customer churn is the number of customers that stopped buying your company's services or products during a period. It is the key metric when it comes to measuring and improve your customer retention and represents the major challenge for any business.
Unfortunately, churn is often overlooked by businesses aiming to grow their customer base, treating recruitment and retention problems as two siloed problems. Yet, a famous Harvard Business Review survey states, "acquiring a new customer is anywhere from five to 25 times more expensive than retaining an existing one" . Retention drives profitability, as existing and trustful customers are more easily convinced by your brand's product than potential new clients are. Despite that, we still see companies disregarding their retention strategies for customer recruitment .
Creating synergy between recruiting and retaining is the first step to optimize your churn.
This is why a customer data plateforms CDP gives a new dimension to churn rate optimization. ACQUIA CDP is capable to improve the customer experience meanwhile it puts the churn rate monitoring in the center of the business growth analysis.
A CDP Refreshes our approach to the customer journey
It is important to consider a high churn rate as a consequence of a non-optimized customer journey.
Acquia CDP centralizes all your data sources and redefines the way teams apply this data to create better business efficiency and an improved customer experience.
A CDP connects data sources in a single platform and unfies customer identities and insights across all channels. With all this data brought together in a single source of truth, businesses can activate personalization campaigns and consistent customer experiences at scale.
Today, data drives all core business operations from IT to data engineers to marketers. Departments must work together from a centralized platform to tap into the full potential of their customer data.
Acquia partner SQLI works with all of these times across a company to set up a virtuous circle that focuses on lowering churn rates across all marketing activities. Here’s what that process looks like.
- SQLI ingest your data sources into the Acquia CDP.
- SQLI ensure Identity reconciliation.
- SQLI create addressable segments.
- SQLI built a machine learning to address in real-time the churn rate optimization.
- SQLI set-up Dashboard & Report for better cross-channel customer analysis.
- SQLI built adapted real-time journey orchestrations depending on the customer "churn risk"
SQLI Built an Integrated ML Methodology to Ensure Real-Time Churn Risk Monitoring
The method described and applied by SQLI is supported by Acquia CDP’s out-of-the-box machine learning algorithms speeds up the development process considerably.
Model for churn prediction
The machine learning approach consists of building a model which can compute the output from input. The model is built without being explicitly programmed using historical data. In the context of churn prediction, the inputs are the customer characteristics, and the output is the probability that the customer will churn.
More precisely, the prediction is a forecast since it involves time. Knowing what are the current and past characteristics of a customer, is he likely to churn in a more or less near future? Thus, the model input is data taken in a time window that starts in the past and ends at a specific time point (this is called the observation window). And the model output is evaluated using data taken in a time window that starts from the same time point (called the response window).
When the model is trained and evaluated, the time point α is in the past. But when the model is used in production to predict whether a customer is likely to churn, α is the current time.
The data taken from the responses window is only used to assert whether the customer is still interacting with the brand. If yes, the customer is labeled "no churn". Otherwise, they are labeled "churn". Since the data is labeled the type of algorithm used to build the model is said to be supervised. The output of the model falls into two categories "churn" and "no churn". Thus, the algorithm is a binary classification (categories and named classes in ML context).
Customer data can be split into two main categories:
- One data category concerns customer characteristics that aren't dependent on the targeted business. For example, this category includes demographic features such as birth date (age is deduced from it), education level, location, income…
- The other data category concerns customer interactions with the brand. Which products were bought? What channel was used? The customer journey is of importance and should be well recorded. Interactions with customer support are also of importance, especially when dealing with churn prediction (e.g., queries sent, number of interactions, history of customer satisfaction scores).
Both categories maters. The first taken solely won't bring a great benefit considering churn, since data from it is rather static (it doesn't change much through time). However, it will bring valuable data since it helps the model learn customer types that are related to customer behaviors.
Data relevance depends on the business goal.
The model is built automatically, and its performance isn't known beforehand. To improve it, results are to be analyzed which leads to input data or/and model improvement. Those improvements will produce new performances which will have to be analyzed again if not good enough and so on. Building a model is an iterative process. The Cross-industry standard process for data mining (CRISP-MD) is an iterative process dedicated to this type of task and is well suited for the machine learning task. CRISP-MD defines 6 steps as shown in the figure below.
1. Business understanding
This is a crucial phase. Understanding the business will help understand the data and prevent its misuse. Furthermore, during this step, the business target will be defined with precision so that the metrics used to evaluate the model will be aligned with the business target. Different aspects must be addressed:
- How long the prediction should run before the customer churns so that the prediction can effectively be used by the business?
- What should be the length of the observation and response window? The length is business-dependent. For a business whose usual customers interact very often with the brand, the length could be taken short. And reversely, longer windows will be chosen for the business whose customer interactions with the brand happen seldomly.
- Note that a short observation window may not be enough to make reliable predictions.
- Prediction is never 100% accurate. Some customers will be predicted "Churn" even if they weren't about to churn (false positive case) and some customers will be predicted "no churn" but they will eventually churn (false negative case). The impact of false negative and false positive on the business should be understood so that the model can be adjusted to privilege more or less one of them (or neither). The impact of false negative or false positive is also business dependent.
- What level of error is acceptable for the business? As mentioned previously a model is never 100% accurate. Refining a model is costly, especially after several cycles of improvement. So why pursuing improvement if the model accuracy is already acceptable so that a ROI can already be obtained?
During this step, business goals are understood and a notion of data relevance is obtained.
2. Data understanding
Data understanding objective is to know what can be expected and achieved from the data. Data quality is checked, in several terms, such as data completeness, values distributions, data governance compliance.
Important questions should be addressed. Is the existing data enough to archive the business target in terms of variety, size, and time span? In the context of customer churn, is the data representative of all the interactions which the customer has with the brand? If some of the interactions are missing, it could lead to some bias since a full view of the customer isn't reached. For example, one customer could be predicted as "Churn" just because he switched from a recorded channel to the unrecorded one. Or if a group of users tends to use a missing channel, the model won't be able to tell much about their probability to churn. If key data and channels are missing, an investigation should be made to know how to fill in the gap.
Interaction with the business at this stage will be required to answer questions that will be raised while investigating data.
One important purpose of a CDP is to have a clean unified 360° view of the customer which includes every channel. When archived this will tremendously help accomplishing this step and it will speed it up.
3. Data preparation
Ideally, a clean 360° view of the customer is available thanks to a CDP before the "churn" project starts. If not the first step would be to get it focusing on the data needed for the project.
Once this view is available data still needs to be prepared and gathered in a format that suits the machine learning algorithm.
Needful data can be produced from the existing data. This process is named feature engineering. If chosen appropriately, this extra data will improve the model performance. The type of data thus generated depends on the business objective. In the context of churn prediction, customer segmentation could be done to produce needful information which helps the model output better predictions. Customer segmentation is also, by itself, the result of a machine learning algorithm called clustering. The clustering is of type unsupervised since it doesn't need any label to form the groups. The groups are identified only once the algorithm has grouped customers into clusters. Another possibility to obtains relevant features is to do sentiment analysis over customer communications. The sentiment analysis is also done using the machine learning approach.
Having too many features will bring impediments while building the model (curse of dimensionality, model performance issues…). It is a good practice to identify and select features that are more informative regarding the business target and to remove redundant features. Several statistical methods are at hand to do this selection (e.g. correlation between input variables to identify redundant variables).
As explained previously the algorithm used for churn prediction is a binary classification. Many algorithms can be used for binary classification: Logistic regression, decision tree, random forest, gradient boosted tree (XGBoost), Support Vector Machine (SVM), neural network… Some are more basic (e.g. logistic regression) et some more complex (e.g neural network). Generally speaking, the more an algorithm is complex, the more it can grasp business complexity. But more it will require computational power which results in increased cost or/and computation time. So, no need to start with the more complex one.
The model is run over a set of customers to predict churn. The model prediction is then compared to what happened. In the case of binary classification, four outcomes are possible.
- True positive. The model predicts “churn” and the customer effectively churned.
- True negative. The model predicts “no churn” and the customer effectively didn’t churn.
- False positive. The model predicts “churn”, but the customer didn’t churn.
False negative. The model predicts “no churn”, but the customer did churn.
The number of occurrences in each case is counted. A perfect model would have no false positive and no false negative. Other metrics can be built on top of those numbers (precision, recall, f1 score… ).
To improve the result, model tuning is performed. The model is built automatically by the algorithm. However, the algorithm has some inner parameters (call hyperparameters) which can be adjusted. The act of adjusting them is called tuning.
The result obtained should be compared to what was defined as a target during the "Business understanding" step.
If the result is below expectation, an error analysis is conducted. A random customer sample is extracted. Each prediction is analyzed and the errors (when they occur) are categorized. At the end of this exercise, the error category which has the higher frequency will be addressed first. This is a Pareto-like process where we focus on issues that will give more benefits and address potential sources of error.
The result of the analyses will help to understand what should be done next to improve the model. Depending on the issue, the correction could consist of different actions:
- The algorithm used was too simple, it couldn't grasp the underlying complexity of the business. The algorithm should be replaced by a more complex one.
- Important discriminative data is missing. It should be added to the input dataset. This could be a blocking point if there is no evident way to get this data.
- New engineering features are required.
The observation and response window needs to be adjusted.
The model is also tested against data that wasn't used to build it. If the model performance decreases significantly, it is due to overfitting. The algorithm was too good in learning the data. It has learned noisy signals which aren't relevant to the target. Specific adjustments are available in the data scientist toolbox to avoid this issue.
Once the business target is met, a final evaluation is done which uses data that wasn't used ever in preceding steps. This provides an unbiased evaluation of what should be expected as the model goes to production. The model is ready for the next step.
This step consists of deploying the model into production so that it can be used by the end-user.
Note that the model is never used directly by the user. The model results can be displayed into a dashboard, or the model can be requested as an API by another application that will use its results to send notifications by email, SMS, or by whatever means which suits the business needs. The way the model is used is out of the scope of this section which deals with machine learning.
Using a modern platform such as Acquia CDP, deploying a model is done easily and rapidly once the DevOps pipeline is setup.
Once a model is in production, the work isn't finished. Since the model was built using past data it may predict poorly if new customer patterns appear afterward. This phenomenon is called data drift. The model should be re-evaluated regularly and retrained to include the new data.
Conclusion and Perspectives
To be successful, a churn detection project needs to be sustained by:
- A good understanding of the business and of the available data.
- Data completeness and accuracy, on the customers, the products and how customers interact with your brand.
- Understanding machine learning models and parameters to adapt them to the business needs.
Acquia CDP provides a unified 360° view of the customer by including all channels and recording all customer journey. Coupled with the SQLI methodology and expertise that rationalize and speed-up machine learning applications, brands can achieve more substantial marketing success, reduce customer churn and retain more customers over the long-term.
This article was focused on churn prediction which is key for business improvement. However, machine learning can be applied to a lot more aspects of marketing such as:
- Likelihood to engage
- Likelihood to convert
- Likelihood to respond to a discount offer
- Likelihood to make a repeat purchase
- Likelihood to make a return
- Likelihood to churn
- Product cluster
- Behavioral cluster
- Seasonal cluster
- Next best product
- Next best action
- Send time optimization
1. The Value of Keeping the Right Customers, Amy Gallo, October 29, 2014, Harvard Business Review.
2. SEM rush
3. TARP Study for Coca-Cola, 1990
The team behind this project
Arnaud Prades (ACQUIA)
Director of Machine Learning
Fill in the form