### Developers Club geek daily blog

2 years, 11 months ago
Task No. 1 for the retailer — to understand who specifically makes purchases in shop, to study behavior of buyers, to select typical models, and by means of this knowledge to influence quantity and quality of purchases.

The solution is possible, using such approaches:
• data analysis from programs of loyalty and other forms of studying of persons and behavior of buyers;
• data analysis about purchases and transactions.

Paraphrasing the second approach — what goods the buyer put in the basket?

The analysis of consumer baskets (Market Basket Analysis), is a set of analytical approaches for understanding of behavior of clients, the choice of goods, determination of associations and communications between goods in each check, probability of purchase of goods together.

For the analysis data from checks of a distribution network are used:
• the list of goods which are included into the check;
• date of purchase;
• purchase time;
• check cost.

There is an identification of typical consumer baskets and actions, to a binding to dates, days of the week, time of purchases, cost of checks.

The retailer, using this information, answers many questions, for example:
• What is bought on Friday evening?
• What typical baskets happen Saturday morning?
• With what buy strong alcohol?
• What goods get to baskets from 200 UAH?
• What brands are included into checks with Martini Vermouths?

There is a wide choice of algorithms for the analysis of baskets, from the main we select Apriori, Eclat, FP-growth. More in details on the analysis of opportunities and distinction of algorithms we will stop in the following publications.

Each of algorithms of all set of checks selects associative rules (association rules) — the repeating communications between goods in baskets.

In the analysis of baskets of buyers and determinations of interrelations between goods associative rules sound so:
"If the buyer buys goods X, then with a certain probability he will purchase goods Y".

Basic data for the associative rule:
X and Y — 2 goods, categories or commodity groups.
X \is defined how key goods, the reason;
Y — the accompanying goods, a consequence.

For an example of use of Associative rules we will take 10 checks with such 4 goods: Milk, Bread, Fish, Eggs:
3. fish;
4. milk, bread, eggs;
5. bread, eggs, fish;
6. eggs, fish;
8. milk, fish;
9. milk, fish;
10. eggs, fish.

Using these data we will accept and we will consider the rule:
"If buy Milk and Bread (X), then will purchase also Eggs (Y)".

The probability of purchase decides on the help of such calculations:

Support (Support) — is calculated for one goods/couple, shows a ratio of number of checks with the selected goods/couple and total quantity of checks.

The combination of 3 selected goods of Milk of Bread and Eggs at us in one check, means:
Support Х = the number of checks with Milk and Bread / total quantity of checks = 2/10 = 0,2

Support Х and Y = the number of checks in which X and Y/total quantity of checks = 1/10 = 0,1 occurs

Support shows influence of the selected rule on all purchases which are analyzed. The more the indicator, the more often the rule works for the general data array.

Reliability (Confidence) — shows a ratio of checks of X and Y to number of checks in which buy only the reason — X.

I.e. what probability that if in a basket of the buyer Milk and Bread already lie, he will purchase also Eggs.

Confidence = support of X and Y/support of X = 0,1/0,2 = 0,5

In our case it is 50% probability as from two checks with Milk and Bread, one contains also Eggs.

In the analysis of couples of goods also we use correlation coefficient. It defines, dynamics of sales of two goods is how similar.

3 options of correlation of sales of goods are possible:
• the correlation coefficient is higher than 0th, positive correlation, diagrams of sales of goods are similar or identical;
• correlation coefficient 0 — diagrams of sales of goods considerably differ;
• the correlation coefficient is lower than 0th, negative correlation, diagrams of sales of goods are opposite, sales of one goods grow, the second goods fall.

The analysis of consumer baskets gives information on a combination of goods in typical baskets. But, often this information is too obvious or already known to the retailer proceeding from its experience or popularity of goods. In the analysis of baskets determination of implicit communications between goods will be important.

For this BI Datawiz.io gives the chance to filter consumer baskets on the periods, days of the week and temporary intervals. To define typical baskets on weekdays and to the days off, during different seasons, the periods before holidays, i.e. to purposefully structure checks in the set parameters.

Filtering allows to determine typical baskets of a different price category by the cost of checks, for example, what goods are included in checks from 30 to 50 UAH, from 500 to 1 thousand UAH.

One more aspect is the analysis of consumer baskets on categorization levels.

The categorization is a distribution of goods on different commodity groups. For example, "Milk and dairy products", are category of the top level, it is divided into subcategories — dairy and fermented milk products. Fermented milk products are subdivided into yogurts, kefirs, fermented baked milks, etc. of different brands and brands. Bottom level of a categorization this specific description of goods.

Service allows to analyze typical baskets on different levels of a categorization. For example, at the top level of a categorization buy Low alcohol drinks with Cigarettes, Snekami, etc.

The analysis on brands provides data entry of brands into typical baskets. So, buy Obolon beer with the Luxury chips, and Lviv beer with "Lays".

The analysis in typical baskets allows to determine specific features of consumer behavior by names of goods. For example, purchase of 250 g of the Molokiya kefir of 1,2% and a roll with cottage cheese — a lunch of office worker.

Interesting results are yielded by the analysis on a combination of specific goods to brands or categories. Here consumer habits of clients, advertizing, to some extent even public opinion and traditions influence.

The combination of several types of filtering when carrying out the analysis of consumer baskets gives the chance to strukturizirovat information necessary to the retailer on purchases in a network, it is correct to stargetirovat a promoaktivnost and as result to increase profit.

Entry of goods into a basket is influenced by an arrangement of goods in a trading floor, promotion action, a nayavnost of goods on shelves and other factors.

It should be considered applying and using results of the analysis to exclude wrong data. For example, goods which are placed about cash desks buy because caught sight so far the person stood in a queue therefore it is not logical to use couples with such goods for the analysis.

### Example of the analysis of consumer baskets

Let's analyze sales of alcohol in a network of supermarkets by means of the BI Datawiz.io service.

Task — to reveal typical baskets with alcohol, to define other goods in these baskets, optimum time for sales of a certain type and brands of alcohol.

For the analysis of alcoholic group we will begin with such indicators:

The period — the current year.
Time interval — on shop hours from 8 in the morning to 23 o'clock in the evening.
The cost of a basket is not limited.
The threshold — 1%, eliminates couples of goods with support less than one percent.
Categorization level for goods consequence — 0.
All days of the week.

As we see, for the current year of checks with alcohol there were 20458.
On the diagram are displayed distribution of checks with alcohol at cost with an interval of 5 UAH:
• in horizontal direction — the cost of checks;
• in vertical direction — the number of checks.

The greatest number of checks of 1922 — about the cost of 75 hryvnias.
The main number of checks with alcohol gets to a price category from 50 to 200 UAH.

Time of purchases
In what shop hours buy most of all alcohol?
Let's analyze the number of purchases in 3 selected temporary intervals:
• with 8 to 13;
• with 13 to 18;
• with 18 to 23.

In the morning from 8 to 13 o'clock in a distribution network — 6153 checks with Alcohol.
Price category (apparently on the diagram) from 50 to 170 UAH.

The second time interval from 13 to 18 o'clock — 10847 checks. A price category from 45-255 hryvnias.

Postworking hours from 18 to 23 o'clock — only 7173 checks. A price category — 40-170 UAH.

Outputs: it would seem, strong alcohol would has to be bought more in the evenings after work, here not. The maximum number of purchases — in lunch hours, in the same time the cost of typical baskets the highest.

The accompanying goods
The goods accompanying alcohol are rather predictable. The most popular cigarettes, beer, low alcohol drinks, water.

Let's try to narrow search, having exposed filters on Frank_vska Gor_lka vodka, morning of Saturday.

Use of the Categorization Level parameter for goods consequence gives the chance to analyze couples of goods on different levels of categories. To obtain information what brands, categories buy with specific goods. For this purpose it is necessary to expose the necessary level of a categorization only.

In this case we use the third level of a categorization and we will define what types of goods buy with a specific brand of vodka.

The most popular checks — to 100 UAH.

The accompanying goods to "Frank_vska a gor_lka" on Saturday morning macaroni and grain, beer, fish canned food.

We would call these baskets — "The bachelor's breakfast".

Let's look at typical baskets with vodka on Saturday and Friday in the evening, more expensive brands, and here enter couples to them other types of alcohol, a cigarette, confectionery, sweet water and power engineering specialists.

On the picture combinations from three goods in such typical baskets are shown below.

Let's analyze checks from 200 UAH where champagne enters.

Apparently on the picture, typical baskets from 200 UAH with champagne include several types of alcohol, i.e. most likely it is purchases on family celebrations and a sit-round gathering.

Though couple of goods us nonplus champagne and sunflower seeds.

Purchases by holidays
It is the simplest to illustrate recurrence of purchases by holidays on the example of Vermouths.

Initially, Vermouths — ladies' drink and as shows the diagram of its sales, it also drink festive.

Such goods as expensive boxes of chocolates, confectionery, other alcohol, sweet water get to couples with Vermouths at the third level of a categorization. Besides, it is purchases on the feast eve.

To analyze with what brands buy Vermouths we use other levels of a categorization.

At the 5th level of a categorization we receive results as on the picture below.

As we see Vermouths buy with specific brands:
• confectionery Roshen, Rafaello, AVK;
• vodka Green day, Pervak, Helsinki;
• cigarettes Parliament, Kent, Marlboro.

Such information significantly helps the retailer to stargetirovat a promoaktivnost correctly.

What is still interesting, practically our buyer takes any strong alcohol with one of types of Coca.

Coca with hard liquors gives a high rate of correlation.

Couples of goods and dependence it is well visible on diagrams of sales.

Sales of Coca, yellow — alcohol sales, red — sales of couple of goods are highlighted with blue color.

Diagrams of sales of Coca and alcohol are similar. Events selling any strong alcohol and Coca will be always successful.

### How the analysis of consumer baskets is used by the retailer?

• For optimization of the calculation of goods. For example, to place pair goods for increase in probability of purchase nearby. Or on the contrary, goods the probability of purchase of which is high, at distance from each other, and between them the third necessary goods.
• For the choice of promotional goods. Good idea of a promoaktivnost — before holidays to allow a discount for expensive candies, for example Rafaello, upon purchase of Vermouths.
• For target marketing activity. Forming of special offers for different groups of clients on the key and accompanying goods.
• For understanding of needs of buyers at different times and days of the week, sentences of the necessary range.
• For actions for increase in number of articles in consumer baskets and the average check.

The analysis of consumer baskets — an important component of work with data in a retail. BI Datawiz.io gives the chance most fully to analyze all available information on purchases in a distribution network.