In this article by Boštjan Kaluža, author of the book Machine Learning in Java, we will discuss affinity analysis which is the heart of Market Basket Analysis (MBA). It can discover co-occurrence relationships among activities performed by specific users or groups. In retail, affinity analysis can help you understand the purchasing behavior of customers. These insights can drive revenue through smart cross-selling and upselling strategies and can assist you in developing loyalty programs, sales promotions, and discount plans.

In this article, we will look into the following topics:

Market basket analysis
Association rule learning
Other applications in various domains

First, we will revise the core association rule learning concepts and algorithms, such as support, lift, Apriori algorithm, and FP-growth algorithm. Next, we will use Weka to perform our first affinity analysis on supermarket dataset and study how to interpret the resulting rules. We will conclude the article by analyzing how association rule learning can be applied in other domains, such as IT Operations Analytics, medicine, and others.

(For more resources related to this topic, see here.)

Market basket analysis

Since the introduction of electronic point of sale, retailers have been collecting an incredible amount of data. To leverage this data in order to produce business value, they first developed a way to consolidate and aggregate the data to understand the basics of the business. What are they selling? How many units are moving? What is the sales amount?

Recently, the focus shifted to the lowest level of granularity—the market basket transaction. At this level of detail, the retailers have direct visibility into the market basket of each customer who shopped at their store, understanding not only the quantity of the purchased items in that particular basket, but also how these items were bought in conjunction with each other. This can be used to drive decisions about how to differentiate store assortment and merchandise, as well as effectively combine offers of multiple products, within and across categories, to drive higher sales and profits. These decisions can be implemented across an entire retail chain, by channel, at the local store level, and even for the specific customer with the so-called personalized marketing, where a unique product offering is made for each customer.

MBA covers a wide variety of analysis:

Item affinity: This defines the likelihood of two (or more) items being purchased together
Identification of driver items: This enables the identification of the items that drive people to the store and always need to be in stock
Trip classification: This analyzes the content of the basket and classifies the shopping trip into a category: weekly grocery trip, special occasion, and so on
Store-to-store comparison: Understanding the number of baskets allows any metric to be divided by the total number of baskets, effectively creating a convenient and easy way to compare the stores with different characteristics (units sold per customer, revenue per transaction, number of items per basket, and so on)
Revenue optimization: This helps in determining the magic price points for this store, increasing the size and value of the market basket
Marketing: This helps in identifying more profitable advertising and promotions, targeting offers more precisely in order to improve ROI, generating better loyalty card promotions with longitudinal analysis, and attracting more traffic to the store
Operations optimization: This helps in matching the inventory to the requirement by customizing the store and assortment to trade area demographics, optimizing store layout

Predictive models help retailers to direct the right offer to the right customer segments/profiles, as well as gain understanding of what is valid for which customer, predict the probability score of customers responding to this offer, and understand the customer value gain from the offer acceptance.

Affinity analysis

Affinity analysis is used to determine the likelihood that a set of items will be bought together. In retail, there are natural product affinities, for example, it is very typical for people who buy hamburger patties to buy hamburger rolls, along with ketchup, mustard, tomatoes, and other items that make up the burger experience.

While there are some product affinities that might seem trivial, there are some affinities that are not very obvious. A classic example is toothpaste and tuna. It seems that people who eat tuna are more prone to brush their teeth right after finishing their meal. So, why it is important for retailers to get a good grasp of the product affinities? This information is critical to appropriately plan promotions as reducing the price for some items may cause a spike on related high-affinity items without the need to further promote these related items.

In the following section, we'll look into the algorithms for association rule learning: Apriori and FP-growth.

Association rule learning

Association rule learning has been a popular approach for discovering interesting relations between items in large databases. It is most commonly applied in retail for discovering regularities between products.

Association rule learning approaches find patterns as interesting strong rules in the database using different measures of interestingness. For example, the following rule would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat:{onions, potatoes} à {burger}

Another classic story probably told in every machine learning class is the beer and diaper story. An analysis of supermarket shoppers' behavior showed that customers, presumably young men, who buy diapers tend also to buy beer. It immediately became a popular example of how an unexpected association rule might be found from everyday data; however, there are varying opinions as to how much of the story is true. Daniel Powers says (DSS News, 2002):

In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves.

In addition to the preceding example from MBA, association rules are today employed in many application areas, including web usage mining, intrusion detection, continuous production, and bioinformatics. We'll take a closer look these areas later in this article.

Basic concepts

Before we dive into algorithms, let's first review the basic concepts.

Database of transactions

First, there is no class value, as this is not required for learning association rules. Next, the dataset is presented as a transactional table, where each supermarket item corresponds to a binary attribute. Hence, the feature vector could be extremely large.

Consider the following example. Suppose we have five receipts as shown in the following image. Each receipt corresponds a purchasing transaction:

market-basket-analysis-img-0

To write these receipts in the form of transactional database, we first identify all the possible items that appear in the receipts. These items are onions, potatoes, burger, beer, and dippers. Each purchase, that is, transaction, is presented in a row, and there is 1 if an item was purchased within the transaction and 0 otherwise, as shown in the following table:

Transaction ID	Onions	Potatoes	Burger	Beer	Dippers
1	0	1	1	0	0
2	1	1	1	1	0
3	0	0	0	1	1
4	1	0	1 Unlock access to the largest independent learning library in Tech for FREE! Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of. Renews at $19.99/month. Cancel anytime	1	0

This example is really small. In practical applications, the dataset often contains thousands or millions of transactions, which allow learning algorithm discovery of statistically significant patterns.

Itemset and rule

Itemset is simply a set of items, for example, {onions, potatoes, burger}. A rule consists of two itemsets, X and Y, in the following format X -> Y.

This indicates a pattern that when the X itemset is observed, Y is also observed. To select interesting rules, various measures of significance can be used.

Support

Support, for an itemset, is defined as the proportion of transactions that contain the itemset. The {potatoes, burger} itemset in the previous table has the following support as it occurs in 50% of transactions (2 out of 4 transactions) supp({potatoes, burger }) = 2/4 = 0.5.

Intuitively, it indicates the share of transactions that support the pattern.

Confidence

Confidence of a rule indicates its accuracy. It is defined as Conf(X -> Y) = supp(X U Y) / supp(X).

For example, the {onions, burger} -> {beer} rule has the confidence 0.5/0.5 = 1.0 in the previous table, which means that 100% of the times when onions and burger are bought together, beer is bought as well.

Apriori algorithm

Apriori algorithm is a classic algorithm used for frequent pattern mining and association rule learning over transactional. By identifying the frequent individual items in a database and extending them to larger itemsets, Apriori can determine the association rules, which highlight general trends about a database.

Apriori algorithm constructs a set of itemsets, for example, itemset1= {Item A, Item B}, and calculates support, which counts the number of occurrences in the database. Apriori then uses a bottom up approach, where frequent itemsets are extended, one item at a time, and it works by eliminating the largest sets as candidates by first looking at the smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. The algorithm terminates when no further successful extensions are found.

Although, Apriori algorithm is an important milestone in machine learning, it suffers from a number of inefficiencies and tradeoffs. In the following section, we'll look into a more recent FP-growth technique.

FP-growth algorithm

FP-growth, where frequent pattern (FP), represents the transaction database as a prefix tree. First, the algorithm counts the occurrence of items in the dataset. In the second pass, it builds a prefix tree, an ordered tree data structure commonly used to store a string. An example of prefix tree based on the previous example is shown in the following diagram:

market-basket-analysis-img-1

If many transactions share most frequent items, prefix tree provides high compression close to the tree root. Large itemsets are grown directly, instead of generating candidate items and testing them against the entire database. Growth starts at the bottom of the tree, by finding all the itemsets matching minimal support and confidence. Once the recursive process has completed, all large itemsets with minimum coverage have been found and association rule creation begins.

FP-growth algorithms have several advantages. First, it constructs an FP-tree, which encodes the original dataset in a substantially compact presentation. Second, it efficiently builds frequent itemsets, leveraging the FP-tree structure and divide-and-conquer strategy.

The supermarket dataset

The supermarket dataset, located in datasets/chap5/supermarket.arff, describes the shopping habits of supermarket customers. Most of the attributes stand for a particular item group, for example, diary foods, beef, potatoes; or department, for example, department 79, department 81, and so on. The value is t if the customer had bought an item and missing otherwise. There is one instance per customer. The dataset contains no class attribute, as this is not required to learn association rules. A sample of data is shown in the following table:

market-basket-analysis-img-2

Discover patterns

To discover shopping patterns, we will use the two algorithms that we have looked into before, Apriori and FP-growth.

Apriori

We will use the Apriori algorithm as implemented in Weka. It iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence:

import java.io.BufferedReader;
import java.io.FileReader;
import weka.core.Instances;
import weka.associations.Apriori;

First, we will load the supermarket dataset:

Instances data = new Instances(
new BufferedReader(
new FileReader("datasets/chap5/supermarket.arff")));

Next, we will initialize an Apriori instance and call the buildAssociations(Instances) function to start frequent pattern mining, as follows:

Apriori model = new Apriori();
model.buildAssociations(data);

Finally, we can output the discovered itemsets and rules, as shown in the following code:

System.out.println(model);

The output is as follows:

Apriori
=======

Minimum support: 0.15 (694 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 17

Generated sets of large itemsets:
Size of set of large itemsets L(1): 44
Size of set of large itemsets L(2): 380
Size of set of large itemsets L(3): 910
Size of set of large itemsets L(4): 633
Size of set of large itemsets L(5): 105
Size of set of large itemsets L(6): 1

Best rules found:

 1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723    <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35)
 2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696    <conf:(0.92)> lift:(1.27) lev:(0.03) [149] conv:(3.28)
 3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705    <conf:(0.92)> lift:(1.27) lev:(0.03) [150] conv:(3.27)
...

The algorithm outputs ten best rules according to confidence. Let's look the first rule and interpret the output, as follows:

biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723    <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35)

It says that when biscuits, frozen foods, and fruits are bought together and the total purchase price is high, it is also very likely that bread and cake are purchased as well. The {biscuits, frozen foods, fruit, total high} itemset appears in 778 transactions, while the {bread, cake} itemset appears in 723 transactions. The confidence of this rule is 0.92, meaning that the rule holds true in 92% of transactions where the {biscuits, frozen foods, fruit, total high} itemset is present.

The output also reports additional measures such as lift, leverage, and conviction, which estimate the accuracy against our initial assumptions, for example, the 3.35 conviction value indicates that the rule would be incorrect 3.35 times as often if the association was purely a random chance. Lift measures the number of times X and Y occur together than expected if they where statistically independent (lift=1). The 2.16 lift in the X -> Y rule means that the probability of X is 2.16 times greater than the probability of Y.

FP-growth

Now, let's try to get the same results with more efficient FP-growth algorithm. FP-growth is also implemented in the weka.associations package:

import weka.associations.FPGrowth;

The FP-growth is initialized similarly as we did earlier:

FPGrowth fpgModel = new FPGrowth();
fpgModel.buildAssociations(data);
System.out.println(fpgModel);

The output reveals that FP-growth discovered 16 rules:

FPGrowth found 16 rules (displaying top 10)

 1. [fruit=t, frozen foods=t, biscuits=t, total=high]: 788 ==> [bread and cake=t]: 723   <conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.35) 
 2. [fruit=t, baking needs=t, biscuits=t, total=high]: 760 ==> [bread and cake=t]: 696   <conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.28) 
...

We can observe that FP-growth found the same set of rules as Apriori; however, the time required to process larger datasets can be significantly shorter.

Other applications in various areas

We looked into affinity analysis to demystify shopping behavior patterns in supermarkets. Although, the roots of association rule learning are in analyzing point-of-sale transactions, they can be applied outside the retail industry to find relationships among other types of baskets. The notion of a basket can easily be extended to services and products, for example, to analyze items purchased using a credit card, such as rental cars and hotel rooms, and to analyze information on value-added services purchased by telecom customers (call waiting, call forwarding, DSL, speed call, and so on), which can help the operators determine the ways to improve their bundling of service packages.

Additionally, we will look into the following examples of potential cross-industry applications:

Medical diagnosis
Protein sequences
Census data
Customer relationship management
IT Operations Analytics

Medical diagnosis

Applying association rules in medical diagnosis can be used to assist physicians while curing patients. The general problem of the induction of reliable diagnostic rules is hard as, theoretically, no induction process can guarantee the correctness of induced hypotheses by itself. Practically, diagnosis is not an easy process as it involves unreliable diagnosis tests and the presence of noise in training examples.

Nevertheless, association rules can be used to identify likely symptoms appearing together. A transaction, in this case, corresponds to a medical case, while symptoms correspond to items. When a patient is treated, a list of symptoms is recorded as one transaction.

Protein sequences

A lot of research has gone into understanding the composition and nature of proteins; yet many things remain to be understood satisfactorily. It is now generally believed that amino-acid sequences of proteins are not random.

With association rules, it is possible to identify associations between different amino acids that are present in a protein. A protein is a sequences made up of 20 types of amino acids. Each protein has a unique three-dimensional structure, which depends on amino-acid sequence; slight change in the sequence may change the functioning of protein. To apply association rules, a protein corresponds to a transaction, while amino acids, their two grams and structure correspond to the items.

Such association rules are desirable for enhancing our understanding of protein composition and hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in the proteins. Knowledge of these association rules or constraints is highly desirable for synthesis of artificial proteins.

Census data

Censuses make a huge variety of general statistical information about the society available to both researchers and general public. The information related to population and economic census can be forecasted in planning public services (education, health, transport, and funds) as well as in public business(for setting up new factories, shopping malls, or banks and even marketing particular products).

To discover frequent patterns, each statistical area (for example, municipality, city, and neighborhood) corresponds to a transaction, and the collected indicators correspond to the items.

Customer relationship management

Association rules can reinforce the knowledge management process and allow the marketing personnel to know their customers well in order to provide better quality services. For example, association rules can be applied to detect a change of customer behavior at different time snapshots from customer profiles and sales data. The basic idea is to discover changes from two datasets and generate rules from each dataset to carry out rule matching.

IT Operations Analytics

Based on records of a large number of transactions, association rule learning is well-suited to be applied to the data that is routinely collected in day-to-day IT operations, enabling IT Operations Analytics tools to detect frequent patterns and identify critical changes. IT specialists need to see the big picture and understand, for example, how a problem on a database could impact an application server.

For a specific day, IT operations may take in a variety of alerts, presenting them in a transactional database. Using an association rule learning algorithm, IT Operations Analytics tools can correlate and detect the frequent patterns of alerts appearing together. This can lead to a better understanding about how a component impacts another.

With identified alert patterns, it is possible to apply predictive analytics. For example, a particular database server hosts a web application and suddenly an alert about a database is triggered. By looking into frequent patterns identified by an association rule learning algorithm, this means that the IT staff needs to take action before the web application is impacted.

Association rule learning can also discover alert events originating from the same IT event. For example, every time a new user is added, six changes in the Windows operating systems are detected. Next, in the Application Portfolio Management (APM), IT may face multiple alerts, showing that the transactional time in a database as high. If all these issues originate from the same source (such as getting hundreds of alerts about changes that are all due to a Windows update), this frequent pattern mining can help to quickly cut through a number of alerts, allowing the IT operators to focus on truly critical changes.

Summary

In this article, you learned how to leverage association rules learning on transactional datasets to gain insight about frequent patterns We performed an affinity analysis in Weka and learned that the hard work lies in the analysis of results—careful attention is required when interpreting rules, as association (that is, correlation) is not the same as causation.