Part 3 of 7 – My First Predictive Analytics Project

Part 1 and part 2 of my first predictive analytics project can be found here. Read them? Cool. Let’s get into the origins of a market basket analysis, and learn a few terms…

Market Basket Analysis originates from the supermarket industry.

Some questions grocery stores may ask are:

  • If the customer buys milk and potatoes, what else are they buying?
  • What do customers who spend over £200 and buy once a month buy?
  • What are customers who buy and never come again buying?

With razor-thin margins and a Just In Time supply chain, these can be crucial questions. It can be the difference between having too much stock and not enough. Neither is good.

If I start writing about statistics:

  • I’ll lose 3/4 of you because that’s just how writing works.
  • These posts will quadruple in length.
  • The risk of me saying something stupid goes up—I can drive the car. I can’t build the engine. There are now fancy libraries and scripts out there to do that.1I know the fundamentals of the theory. I know its limitations. But deriving it from first principles? Not happening in this life or the next.


But here is one mathematical term that’s important to know: Lift.

A Market Basket Analysis is also called Association Rule Mining (or Association Rule Learning). It looks at a bunch of things, or “variables”, and sees how much they “associate” with each other.

These variables that associate together are called a “rule”.

At the end, it will tell you the “lift”, which is how much more likely that rule (or bunch of things) occurs compared to other rules.

It’s a form of machine learning, because, I kid you not, what your computer does is it just brute-force calculates each individual rule in the dataset. This can be thousands upon thousands of rules…

This, I believe, introduces the first problem you can encounter with a Market Basket Analysis: Choosing your data carefully.

The first time I ran my Market Basket Analysis it took two and a half hours to run on my MacBook Pro, only to give me a bunch of outputs I couldn’t use.

The next time I ran it? 15 minutes, and a clear series of actionable insights.

You can see my breakdown of how a Market Basket Analysis works in my write-up here.

I warn you, the entire post takes 29 minutes to read. Just read that little excerpt.

Next time I’ll discuss the big challenge with a Market Basket Analysis—ensuring your data is encoded properly.

Stay tuned.


This was first published on my LinkedIn.

You can read part 4 of this series here.

Footnotes

  • 1
    I know the fundamentals of the theory. I know its limitations. But deriving it from first principles? Not happening in this life or the next.