Efficient Implementation of Apriori Algorithm using Numpy

By 苏剑林 | May 10, 2018

A classic example of association rules: Beer and Diapers

Three years ago, I wrote "Efficient Implementation of the Apriori Algorithm using Pandas", where I provided a Python implementation of the Apriori algorithm that was well-received by some readers. However, my Python skills weren't very advanced at the time, so in hindsight, that implementation wasn't particularly elegant (though its speed was acceptable), and it didn't support variable-length input data. I had promised to rewrite the algorithm to address these issues, and I have finally completed it~

I won't repeat the introduction to the Apriori algorithm here; I'll just post the code directly:

Usage method:

Output results:

[(('A3', 'F4', 'H4'), (0.8795180722891566, 0.07849462365591398)),

(('C3', 'F4', 'H4'), (0.875, 0.07526881720430108)),

(('B2', 'F4', 'H4'), (0.7945205479452054, 0.06236559139784946)),

(('C2', 'E3', 'D2'), (0.7543859649122807, 0.09247311827956989)),

(('D2', 'F3', 'H4', 'A2'), (0.7532467532467533, 0.06236559139784946))]

The meaning of the results is that the first $n-1$ items imply the $n$-th item, such as $A3 + F4 \to H4$, $D2 + F3 + H4 \to A2$, etc.

This implementation is relatively more concise, removing the dependency on the Pandas library and only using the Numpy library. Variable naming is also clearer. Since the programming logic hasn't changed, efficiency theoretically won't decrease and may even improve. It should be compatible with Python 2.x and 3.x; if there are any issues, feel free to point them out.