Revisiting the hidden assumptions underlying churn prediction in the telecom sector to define an innovative approach
This article describes how we have elaborated an innovative approach to understand churn factors and generate a continuously improving churn learning model in the telecommunications industry. We have first revisited the hidden assumptions underlying churn prediction and churn scoring classical approaches that are generally based on “classification” learning models in order to understand the limits of this model.
Starting from the desired output of providing marketing, sales and customer experience teams with the elements required to design and monitor targeted commercial and customer care retention actions, we developed our approach based on auto-generated clusters, continuously learning from the past to generate commercial and care actions targeting micro-segments of existing at-risk customers.
Most telecommunication service providers have increasingly been resorting to advanced data analytic tools and developed sophisticated models to address the churn core business issue, either with their own internal teams or with the support of external resources. Turning analysis into concrete actions and tangible results still remains a technical and business challenge, with limited satisfactory results to date as indicated by many industry interviews that we have conducted.
The commonly used approach to analyze churn in the telecom industry relies on estimating the probability for each customer to churn, which is materialized in a scoring methodology that is then used as a guide for the elaboration of action plans. This scoring approach is typically based on automated learning from past churners, using a “classification” approach to distinguish churners and non-churners.
This article describes our approach to elaborate an innovative data analytic approach to understand churn factors and churn prediction based on what we learned from revisiting the hidden assumptions underlying classical churn prediction models and the inherent limitations of binary churn scores. We developed approached based on auto-generated clusters, starting from the desired output of providing marketing, sales and customer experience teams with the elements required to design and monitor targeted commercial and customer care retention plans.
High-level introduction to machine learning classification:
“Classification” refers to a family of machine learning techniques that automatically deduct an “estimator” out of past churners and non-churners. The “estimator” is a mathematical formulation designed to classify any given subscription as being either similar to the past churners or to the past non-churners, with a given margin of error. The degree of similarity is captured as a score ranging from 0 (no similarity with past churners/non-churners) to 1 (very high similarity with past churners/non-churners).
Whatever the sophistication of the underlying “classification” learning models used (data mining, machine learning or artificial intelligence), the outcome of such an approach remains a set of scores noting each customer between 0 and 1. Our experience suggests that these churn scoring approaches that are widely used among telecom operators rapidly reach their limits, as they only allow to deliver moderate impacts on business results.
In this article, we will try to understand the limits of churn scoring approaches and to investigate an alternate approach to increase business impacts measured in terms of customer survival rates. We first want to challenge the assumptions that underpin the use of classification models for churn prediction in the telecom industry and to understand the implications of these assumptions, which are often hidden. Based on this analysis, we then introduce our revisited approach that we have developed at Lifetime Analytics to build our customer lifetime control center application and to focus on defining targeted commercial and customer care actions to significantly increase the survival rate of those customers that have a higher propensity to churn.
THE HIDDEN ASSUMPTIONS UNDERPIN CHURN PREDICTION
“Classification” learning models are built on the automated finding of an estimator, which allows to differentiate past churners from past subscribers that did not churn during the observation period. We believe that there are several hidden assumptions that are intrinsic to these classification models, and that significantly limit the impact that using these models can have on optimizing commercial and customer care actions to reduce churn.
Absence of Continuity over time: the future is not always like the past
A first major hidden assumption of “classification” learning models is that the identified estimators will continue to apply in the future, just as they did in the past (i.e. the historical observation period). In the case of churn prediction, this hidden assumption is that future churners will cancel their subscriptions in the future for the same reasons as the past churners did.
We believe that this assumption must be challenged, particularly in the Mobile post-paid and multi-play battlefield where very frequent new promotions and offers shape the competitive market environment. For instance, as churn can be linked to the occurrence of exogenous events such as the launch of a new “bestselling” offer or promotion by a competitor, continuity of the prevailing market conditions at any given time cannot simply be assumed to prevail in the future.
Moreover, a customer’s perception of service quality is not linearly correlated to the operator’s quality of service (QoS), but can be affected by some incidents, usage variation, requests, bill shock etc., which become events impacting a customer’s perception of service quality.
Asymmetry between churners and non-churners: customers are always non-churners up to a point…
Learning models traditionally presuppose that the two classes of churners and non-churners are largely independent and represented in the same proportion over time. Unfortunately, reality often proves to be more fluctuating and churn dynamics are usually not linear.
Churners typically represent between ~7% and ~25% of subscriptions over a twelve-month period depending on the operators and the prevailing competitive intensity in any given market, which varies over time in function of a market’s maturity and competitor commercial activities. Moreover, all churners are by construct non-churners up to the point when they effectively terminate their subscription and become churners. In addition, the occurrence of the churn event and its link to past events is constrained by the commitment of a customer, i.e. the remaining term of the customer’s contractual engagement.
Lack of contextual explanation provided by a single churn score: all churners are not alike
As the “classification” learning models only aim at assigning a score that ranges from 0 to 1, this approach offers limited resources in providing further explanation on the root cause of churn and does not consider different churn profiles, i.e. different subsets of customers sharing the same churn factors.
Accordingly, classification models stop short of providing detailed insights about churn reasons and assisting in the definition of specific marketing or customer care actions. To increase the performance of retention actions, acting on concrete and detailed customer information is essential, i.e. better targeting customers with campaigns that address their actual concerns.
Correlation does not mean explanation, but it provides insightful pointers
When provided with binary churn scores, the tentation is high for marketing, sales and customer experience teams to intuitively use statistical correlations to jump to root cause conclusions, i.e. interpret parameters of the estimator to construct general explanation of what the estimator means, mixing and matching causes and effects. Resorting to a single estimator to pinpoint the root-cause analysis of churn entails 2 severe drawbacks:
It can result in aggregating several customer churn profiles that are vastly different in nature: It seems hazardous to consider the same explanations for customers that experienced a perceived loss of QoS, that are promotion junkies or that just subscribed to an offer that is off market.
The mere correlation of an event with the subsequent churn of a customer does not necessarily provide an explanation: for instance, while many existing customers connect to their operator’s portal some time before churning, this fact (at least not on its own) does not explain why these customers effectively churned.
While a learning model cannot provide all answers on its own, each churn profile with its specific subset of churn factors provides marketers with the insights to perform the root cause analysis of churn with a much greater degree of precision based on these specific churn factors.
Scoring models struggle between individualization and generalization: all customers do not react the same way
Applying an estimator at the individual customer level presupposes that all customers belonging to a group will homogeneously react to commercial and care actions, the so called “sheep effect.” High out-of-bundle charges are typically recognized as an important cause of churn, but all customers do not react in the same way to an exceptional billing event.
As capturing the specificity of all customers cannot be achieved, a churn prediction only makes sense for a group as a whole or for a subset of customers, but it cannot be given a meaning at the individual customer level.
The myth of the expected neutrality: operator actions often impact customer reactions
Classification learning models do not take into consideration the effect that a future action may have on a customer (e.g. a retention or care action) as they only address past observations. Churn analysis must consider how customers will react to events, including commercial and care actions, e.g. “waking up” a customer with an action, exercising commercial pressure on customers, how the marketing action is performed, the content of the promotion, etc.
REVISITED APPROACH BY LIFETIME ANALYTICS
In developing our Lifetime Analytics application, we sought to overcome the limitations described above, which are inherent to traditional churn classification models. Our extensive discussions with senior marketing, sales and customer experience teams all pointed out to the inherent difficulty of designing action plans based on results obtained from traditional churn analysis and data models.
We have accordingly designed our Lifetime application around a revisited approach primarily targeted at identifying the commercial and customer care actions required to retain at risk customers.
The cornerstone of our modified approach lies in replacing churn scoring by the detection of clusters through a semi-supervised automated learning method. The Lifetime Analytics application thus detects clusters of subscriptions presenting a high concentration of past churners based on data and events, which thus allows to identify clear churn factors supporting the design of targeted marketing, commercial and care action plans.
Learning about churn profiles to target commercial and care actions
Launching targeted commercial and care actions that will successfully retain customers requires data learning to detect churn profiles sufficiently ahead of potential churn events. Our cluster detection approach allows to identify specific churn profiles based on historical data, typically over the past one to six months prior to the occurrence of a churn event. These churn profiles are then used to identify at-risk existing customers: the objective is to anticipate customer moves and to design specific commercial and/or care action plans targeting these customers to re-engage then and address their concerns before they take active steps to terminate their subscription or sign-up with a competitor.
In contrast, immediate/real-time churn profile learning often does not leave enough time to run the sequence of targeted commercial and/or care actions required to re-engage at-risk customers with a good trade-off between the required promotional or retention investment and the effective re-engagement benefit. We thus believe that our cluster approach better allows operators to anticipate customer reactions and act on their mid-term churn with better balanced actions, achieving a better performance.
Continuous detection of churn clusters
As customer service experience and market conditions constantly change, we continuously produce clusters of customers presenting a similar churn profile. This production of clusters nourishes continuous churn profile learning based on various internal and external events, e.g. changes of competitors’ bestselling offers (prices and features) and the customer feedbacks about services and marketing actions (customer experience).
We believe that the contextual explanation of the root cause of churn lies in identifying different churn profiles, i.e. different subsets of customers sharing the same churn factors. Accordingly, each churn profile with its specific subset of churn factors provides marketers with the insights to perform the root cause analysis of churn with a much greater degree of precision based on these specific churn factors.
Generating several contextualized clusters of customers each with their own specific churn profile is thus an essential feature that allows continuously designing and launching multiple smaller and narrowly targeted campaigns over micro segments of customers, rather than just a few large-scale and static “one size fits all” campaigns.
Evaluating clusters based on business metrics, rather than statistical metrics
The statistical metrics that are typically used to evaluate the performance of machine learning models often have limited relevancy on the achieved business impacts. Statistical machine learning metrics such as recall and precision effectively assess how the model is correctly predicting the past churners, but these metrics do not take into account crucial factors such as seasonality and customer reactions to retention actions and, more importantly, they do not assess the business impact that the model can achieve.
We rather measure the business impact of our cluster approach based on performance metrics that are more closely linked to the desired business impact.
We therefore assess the performance of our cluster profiles based on the uplift between the last-twelve-month (LTM) churn rate of customers belonging to a cluster with the LTM churn rate of the entire product family. This business-driven performance metrics delivers a clear insight about how much the detected churn profiles of a cluster exceed the churn trends of the product family as a whole, which also has the benefit of limiting the volatility of precision month over month as a result of seasonality.
Supporting the root-cause analysis
While the classical “classification” approach provides estimator scores as the output to be used to define action plans, our cluster approach provides marketing, sales and customer-experience teams with a set of “churn factors” that are at the root of a cluster’s detection. As a reminder, these “churn factors” are a specific combination of values for different internal (subscription related) and external (market related) data and events common to all the customers belonging to a given cluster. Each combination of a given set of churn factors thus defines a customer churn profile. For each churn profile, the specific combination of churn factors provides the insights to perform the root cause analysis of churn with a much greater degree of precision.
In order to further support the marketing, sales and customer experience teams in efficiently realizing this root cause analysis, the application provides explanatory links between the primary and correlated churn factors and the different components of the data model. For example, when an out-of-bundle charge emerges as a churn factor for a given cluster, the application automatically proposes a deep dive into the sources of the identified OOB charges (e.g. voice overconsumption, mobile data usage, roaming, special numbers, etc.), as these data are also corelated with changes in usage patterns.
Correlation, however, does not mean causality, it is just the starting point. Some human expertise from marketing, sales and customer experience teams allows to connect the dots by interpreting correlations and linking data components of the data model in function of their relationships. For example: OOB charge is linked to Mobile data over-consumption to Mobile data usage and to Marketing product.
Accordingly, our cluster approach is designed to nourish the root-cause analysis by identifying the primary and corelated churn factors that results from the detection of the cluster, further making these churn factors explicit by automatically linking them to the relevant components and values of our data model. Hence, the emergence of a cluster contains by definition the basic analytical components that then serve to design the commercial or customer care action plan specifically targeting this cluster, or even more specifically targeting a subset of customers belonging to this cluster based on different filters that can be applied.
While automatically generating and launching action plans still appears overly ambitious at this stage, the churn factors having defined a cluster as well as the links between these churn factors and the performance of actions launched on similar clusters allow to automatically generate best action recommendations supporting the marketing, sales and customer experience teams for the design of optimal commercial and customer care action plans.
Enriching data with external sources (e.g. competition prices and handset characteristics)
While most data elements are sourced from the operator’s BSS and OSS, injecting external data sources can significantly enrich the potential churn factors and thus lead to the emergence of new clusters. By way of illustration, the relative price positioning between the price paid by each customer and the prices of comparable competitor bestselling offers is used as a parameter of the learning model to assess the existence of a better offer. This factor can also be correlated with mobile and fixed number portability data to take into consideration the destination of churners porting out. Similarly, the characteristics of customer devices (e.g., model, release, screen size, memory, battery characteristics, etc.) contribute to define churner’s profiles and design retention actions.
Applying past churn profiles to detect at-risk existing customers
Learning from the past to act on existing customers is the transition at the core of our approach: the detection of at-risk existing customers is based on applying the same combination of churn factors than those that were found in the profiles of past churners.
Transposing the churn profiles resulting from our learning model to the current customer base thus allows to define micro-segments of existing customers with the same profiles than those of the past churners, and these at-risk customers can then be targeted with specific commercial and care actions derived from the specific combination of churn factors for each churn profile. This continuous cluster detection approach enables operators to design a flow of regular and contextualized commercial and customer care actions through customer lifetime cycles, independently of commitment period anniversaries.
This approach allows to better manage the impact of operator actions and to reduce the risk of “wake up” effects when the commitment period ends, i.e. when customers reaching the end of their commitment period are presented with a retention offer by their existing operator, they are often reacting by looking out in the market for a better offer from competitors.
Moreover, this approach can better secure “jumpers” and “promotion junkies” by locking them into renewed commitments through feature-based offers (get a lot more for a bit more money).
Overall, the continuous monitoring of clusters and the detection of new ones are fundamental pillars of our approach for enabling operators to design targeted commercial and customer care actions throughout the customer lifetime. Moving away from the rigid customer journey milestones allows for a personalized approach predicated on customers’ actual concerns instead of adhering to operator-determined customer lifecycles.
Action design and monitoring
In our vision, the design and monitoring of retention actions is not isolated from churn analysis, but it is an integrated component of the end-to-end churn/retention approach and process. As a matter of fact, the design of our application started with the identification of the desired output to elaborate action plans, and we have elaborated our data model and cluster approach in order to deliver these outputs.
As already discussed, the root-cause analysis nourished by the cluster’s churn factors and their links with the data models serves as the foundation for the design of targeted action plans for a cluster or a micro-segment of a cluster. We believe that more targeted action plans can be more successful as they are conducive of designing more specific responses to actual customer concerns and they allow to invest more in retention over a reduced number of at-risk customers.
In order to calibrate action plans, we have also developed an action business case simulator taking into account the proposed action’s impact on ARPU, AMPU, cash outs (commercial costs, equipment and installation, etc.) and payback period to ensure that the proposed action plan is based on a profitable business case.
Moreover, monitoring the real impact of retention actions on churn and more specifically the customer survival rate is an essential component of our application to ensure that it delivers the desired business impact and that it capitalizes on continuous improvements. Beyond the monitoring of campaign execution (i.e. in terms of successful contacts, offer acceptation, etc.), the application therefore monitors cohorts resulting from each performed action to assess the actual retention impact of each action. More specifically, the application continuously tracks the actual survival rate improvement for customers that were contacted and that accepted the proposed measure vs. those that were not contacted or that were contacted but did not accept the proposed measure.
While telecom operators devote considerable resources and brain power to develop sophisticated churn models aimed at improving retention, we believe that the classical churn prediction approach generally used by operators is often built on assumptions that can be challenged and deliver outputs having limited business impacts on customer retention. More specifically, classical approaches of churn prediction and churn index scoring based on “classification” learning models entail some meaningful analytical bias out of hidden assumptions and significant intrinsic limitations resulting from binary outputs, which considerably reduce the business impact of this classical approach to churn reduction.
Based on this finding, we have elaborated a different approach to analyze churn and churn prediction, starting from the desired output of providing marketing, sales and customer care teams with the elements required to design targeted commercial and customer care retention plans on a continuous basis. The cornerstone of our approach is therefore to enable a continuous flow of integrated activities – from churn analysis to marketing action monitoring leading to small, targeted, adapted and financially well-balanced marketing, commercial and customer care retention actions.
Our approach relies on the automatic detection and monitoring of clusters of at-risk existing customers presenting a high concentration of past churners. These clusters are auto generated by the application based on data and events, which means that clusters accordingly evolve over time as customer situations and market conditions constantly change.
The transition from learning out of the past to act on existing customers is at the core of our approach: the detection of at-risk existing customers is based on applying the same Combination of factors than those that were found in the profiles of at-risk customers that recently churned. This cluster approach allows to continuously identify customers requiring commercial or customer care actions and enables the design of targeted action plans and micro-campaigns, thus replacing the more traditional scoring approaches that are often less actionable.
The main challenge of this “continuous” evolving approach is that it cannot be operated easily with ad-hoc data and marketing projects due to the workload resulting from continuously detecting and monitoring multiple clusters and related marketing, commercial and customer care actions. The Lifetime Analytics application design is thus based on a productized end-to-end approach to empower marketing, sales and customer experience teams based on a flexible open data model that is nourished on a monthly basis with a set of pre-defined BSS and OSS data items.
By Frédéric BEAUVAIS and Julien CABOT
The authors are co-founders of Lifetime Analytics S.A.S.