Customer Segmentation with K-Means and Tableau

Published on 2025-01-06 by Amber Walker

customer-segment

This was a really exciting project where I was able to blend my two academic backgrounds (Marketing and Data Science) to extract insights from an e-commerce clothing store dataset developed by the Google Looker team. The workflow includes data extraction and transformation using Google BigQuery and DBT (Data Build Tool), clustering analysis with Python, and interactive visualization with Tableau.

Data Extraction and Transformation with DBT and Google BigQuery

I sourced the raw dataset from the TheLook e-commerce dataset in Google BigQuery, which contains customer behavioral data, including purchase history, order details, product categories and spending metrics. I used DBT to build a robust and reusable data transformation pipeline. Using DBT, I created models to:

  • Clean and preprocess the data to ensure consistency and accuracy.
  • Transform raw transactional data into aggregated RFM metrics (Recency, Frequency, Monetary) as well as other features like web interactions and category proportions.

Screenshots of the DBT models and pipeline showcase how transformations were designed and executed to prepare the data for clustering analysis.

u-net deep learning model
This model aggregates the data to extract RFM metrics as well as returned and cancelled order ratio per person.
u-net deep learning model
This model calculates each customer's tenure using the DATE_DIFF function (specifc for BigQuery).
u-net deep learning model
This was a more complex query where I first took the sum of total sales per customer and clothing category, then divided total sales per category by the overall total sales per customer and pivoted the table to get the proportion of category per customer.
u-net deep learning model
I then ran to a query to fetch only the customers that have bought and completed at least one order.
u-net deep learning model
Finally I joined all of the data together (including demographics and web interactions) making sure to include only the customers who bought at least one completed item.

Clustering Analysis with Python

After extracting and transforming the data, I used Python for exploratory data analysis (EDA) and clustering. Specifically:

  • Implemented and explored two different clustering techniques, Gaussian Mixture Model (GMM) and K-Means, to identify distinct customer segments.
  • Segmented customers based on RFM metrics, focusing on their purchase behavior and engagement.
  • Evaluated cluster quality using the Silhouette Score, which measures how well-defined the clusters are. K-Means outperformed GMM, achieving a higher Silhouette Score (0.44 vs. 0.33).
  • Selected K-Means for final segmentation due to its better performance and more interpretable results.
Check out my GitHub repository to see the code in action: View on GitHub

To visualize how well the K-Means model clustered the data, I used Principal Component Analysis (PCA) to project high-dimensional data into three principal components for visualization:

kmeans clusters
You can make out four disctinct groups even though some data are overlapping.

Interactive Visualization with Tableau

To make the insights actionable, I visualized the clustered data in an interactive Tableau dashboard. The dashboard includes:

  • A summary of the customer segments, labeled as Platinum, Gold, Silver, and Bronze.
  • Visual comparisons of the clusters based on recency, frequency, and monetary values.
  • Geographical insights into customer distribution and segment trends.
  • Interactive filters to explore specific timeframes, locations, or customer attributes.

Here is a quick Tier Summary to understand the cusotmer segements a little better.

Cluster Classification Characteristics Strategy
2 Platinum High spenders, moderately active, low returns Retain loyalty with VIP benefits, exclusivity
0 Gold Mid-level spenders, frequent buyers, high returns Reduce returns, incentivize repeat purchases
3 Silver Low spenders, infrequent buyers, recent activity Encourage repeat purchases with discounts/offers
1 Bronze Dormant, low spenders, very inactive Reactivate with targeted campaigns or surveys

Below you will find the interactive Tableau dashboard:

This visualization enables stakeholders to quickly identify high-value customers, dormant users, and actionable marketing opportunities.

Conclusion

By combining DBT for data transformation, Python for clustering analysis, and Tableau for visualization, this project delivers a comprehensive solution for customer segmentation. Next steps include a model to predict customer lifetime value, building a data pipeline and automating the workflow. Thanks for reading!