Predictive Modeling vs. Clustering

Data mining activities are concerned with improving customer understanding within an enterprise to impact the bottom line across customer service, sales, marketing or operations by generating revenue or controlling costs or risks.  Though predictive modeling and clustering are both considered techniques within data mining, their methods and application are quite different even when the goals are the same.  The algorithms used vary as well depending on whether the analysis is supervised (predictive modeling) or unsupervised (clustering).

Predictive modeling is concerned with the classification or estimation of an attribute and is a supervised approach.  This means that historic data is used to train and test a model, then the model used to score new records for the purpose of prediction.  An example would be to use historic demographic, transactional, attitudinal and behavioral data within a financial institution to address customer retention.  The field of interest is a flag (1 / 0 or true / false) indicating whether a customer is a current customer or not.  Predictive models are created based on the combined data sources, then customers scored for their likelihood to leave or churn.  Typical algorithms used for predictive modeling can be from the field of statistics (linear regression, logistic regression, cox regression, discriminant analysis) or data mining (decision trees, neural networks).

Clustering or segmentation is an unsupervised approach that divides records in a data source into sub-groups, where records within a sub-group are similar to each other and likely will behave in a similar fashion. An application of this would be to cluster the customers within a financial institution using demographic, transactional, attitudinal and behavioral data, then design programs to address retention, cross-selling and growth initiatives for each subgroup. Clustering differs from predictive modeling in that it is usually an initial step and not the end goal, as clusters discovered are often used as inputs to predictive models, in the form of an indicator of cluster membership. First, the clusters uncovered must be explored for meaning and relevance. If the initial analysis results in 5 clusters, and 2 have little usefulness, the analysis can be re-run to force 3 or 7 clusters in the result to better understand the segments. When a sub-group can be acted upon from a business perspective, we have a good sense of it’s usefulness. Kohonen, K-Means and Two-Step are examples of clustering algorithms.

In summary, fully understanding the business issue at hand is the key to determining whether predictive modeling or clustering or both is the proper approach. Both techniques have their place in impacting the bottom line for customers, both in generating revenue and controlling costs and risks.

One Response

  1. Clustering can also be used to create unique customer segments for marketing. Once each segment is profiled, that information is useful to develop unique messages to each segment.

    A simple example — segment A is comprised of Males, age 35-44. Segment B is primarily comprised of Females, age 55-64. It is obvious that different messages delivered to each group will likely be found more relevant and lead to higher response rates for each segment, when compared with one (the same) message being delivered to both segments.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.