
The Problem
“How can we identify accounts at risk of cancelling their subscription?”—this was the question asked to me by a large Scientific Publishing company. I was given 10,000 records of data across 3 tables, and left to figure it out. The deliverable: A deck of 8 slides with analysis, recommendations, and my methodology.
However…
I knew I would get lost if I tried to gather the insights using my usual methods. And the turnaround time was short—3 days. I only had my evenings to spare towards this project.
I needed a new approach.
So, I decided to use a new tool in my data toolkit.
Enter a Market Basket Analysis
It was time to deploy machine learning and predictive analytics. I wanted to find the group of customers who were most likely to lapse in their subscriptions.
After overcoming encoding and data wrangling challenges, I conducted a Market Basket Analysis on the data, using the mlxtend library in Python.
(If you want more information on the business case, how a Market Basket Analysis works, and the challenges it involves, I wrote a data story about it: How I Used a Market Basket Analysis to Get a Job Offer.)
The analysis looked at over a dozen variables, including the country they were from, the size of the account, their NPS score, and more.
It gave me a result.
The Result
I found that customers who joined in 2012 (the oldest year in the dataset), who were small accounts, and had a missing or low NPS score were 3.12x more likely to churn. These customers represented about 5.3% of their accounts with an estimated revenue of $1.7 million.
I created a presentation with a data story building the case for my recommendations. If 50% of lapsed subscribers were saved, it would save the company over $800,000.