Data can provide insights into customer or audience behaviour that allow for more effective targetting. In this article, we look at how clustering algorithms can find patterns in customer behaviour, and how these patterns can help design segment-specific marketing actions to attract and retain customers.

In brief, the core idea is to start with raw data about customers or audience members, "mash" it with data from other sources, and then use a clustering algorithm to split customers into a small number of different segments, based on behaviour and data attributes. These clusters could then form the basis for designing propositions to better attract and retain customers.

As a frame of reference, we'll consider the hypothetical example of a gym that wants to anticipate and reduce the cancellation of memberships. The same ideas could be used for patrons of an arts organization, regular travellers for an airline or railway, or any other organization for which customer transaction data over a period of time is available.

Step 1. Start with customer behaviour data

The process starts with data on how people actually behave, the more detailed the better. For a gym membership, the following information for each customer might be available:

Step 2. Combine with other data sets to get new metrics

Other data sets might be available to provide additional insight, either on their own, or by combining them with attributes from the customer data from Step 1.

Step 3. Derive metrics

The metrics from steps 1 and 2 may not be usable for clustering in their raw form, either because they are too detailed (e.g., attendance history), or they need to be combined to provide a meaningful metric (e.g., customer's distance from his club, based on the two post codes).

This step also helps to reduce the number of dimensions for clustering algorithms, which makes processing faster or even feasible.

Step 4. Let clusters emerge

Once we have collected and "mashed" the data, and brought it up a level, we can use a number of clustering algorithms to try to group customers into a number of common segments, based on similarity as shown in the data.

Clustering works by positioning each customer inside an imaginary n-dimensional space (e.g., an X-Y plot for 2 dimensions, although the dimensions can exceed two- or three-dimensional space).

There are a number of common algorithms to do this, such as k-means, which starts with a pre-defined number of target segments, then iteratively assigns customers to segments based on how close they are to other customers in each segment. Many of these algorithms are available as packages in statistics software such as R, and as libraries for languages such as Python.

The image on the right was generated in R from synthetic data about gym memberships, and shows 200 members assigned to 3 clusters using k-means. There are 3 attributes in the data set (age, monthly attendance, and monthly fee), but the software is able to isolate the two most important variables and use these for 2-dimensional graphing purposes.

It is interesting to note that k-means (and other such algorithms) are unsupervised, so they do not require you to pre-define what a segment looks like. You just pass it the data, the desired number of segments, and let the algorithm do the work of assigning people to clusters.

The need to define the number of clusters in advance may seem a little strange, but it is easy to try the algorithm several times, using a different number of clusters, and see what it comes up with. You want a small enough number of clusters that you can actually act upon them, but for there to be enough clusters for them to be distinct.

There are other algorithms that derive the optimal number of segments automatically (without the need to pre-define the number of clusters). However, it is useful to go through the exercise of looking at cluster definition for different numbers of clusters, since the number of clusters is an important driver of the cost and effectiveness of marketing activities.

When feeding data to the algorithm, it is helpful to focus on behavioural attributes, since these may be more actionable. Thus, you would leave out any variables that suggest a pre-defined cluster assignment (such as membership tiers).

Step 5. Understand the clusters

After the algorithm has grouped people into clusters, you can use a number of statistical summary measures and graphs to review what each cluster looks like. This is not just the number of people in the cluster, but which attributes from the data define them.

For example, one cluster might be young women who visit three days a week, weekdays only, between 10 and 2. Another might be mixed gender, ages 35-55, who visit primarily in the evenings. Another might be people who only come on weekends, and infrequently.

It is sometimes helpful to define personas for cluster profiles, and give them descriptive names. For example, student, mom, commuter, young single professional)

It is then possible to think creatively and articulate behaviours for the clusters, i.e., why they behave the way they do, including why they join/visit/leave clubs.

Step 6. Define actions and desired outcomes

To derive value from the analysis, it needs to lead to action. There are numerous activities are likely to lead to business impact from cluster analysis:

Example applications

There are numerous possible applications for this type of analysis, including:

This is just an overview of cluster analysis and its applications. The best way to see the real potential is to take a real data set, and use the six steps above to find new patterns, and design actions to take advantage of the resulting opportunities.

Add a comment