Customer Segmentation with RFM Analysis in SQL

In the world of data analytics, understanding your customers is key to driving growth. One of the most effective and time-tested methods for customer segmentation is RFM Analysis. But you don't need expensive CRM software to do it—you can build a powerful RFM model right in your database using standard SQL.

In this guide, we'll walk through how to calculate Recency, Frequency, and Monetary scores and segment your customer base into actionable groups like "Champions", "At Risk", and "New Customers".

RFM segmentation overview with customer groups like Champions and At Risk

What is RFM Analysis?

RFM stands for three key metrics that describe customer behavior:

Recency (R): How recently did the customer make a purchase?
Frequency (F): How often do they purchase?
Monetary (M): How much do they spend?

By scoring customers on these three dimensions (usually from 1 to 5), you can group them into segments. For example, a customer with high scores in all three categories is a "Champion," while someone with high Monetary value but low Recency might be "At Risk" of churning.

Step 1: Preparing the Data

To perform RFM analysis, you typically need a transaction table. Let's assume we have an orders_rfm_demo table with customer_id, order_date, and total_amount.

The first step is to aggregate this data to the customer level. We need:

Last Order Date: MAX(order_date)
Count of Orders: COUNT(order_id)
Total Spend: SUM(total_amount)

SELECT
    customer_id,
    MAX(order_date) as last_order_date,
    COUNT(order_id) as frequency,
    SUM(total_amount) as monetary
FROM orders_rfm_demo
GROUP BY customer_id;

Step 2: Calculating RFM Values

Now, let's turn these raw numbers into the "R", "F", and "M" values.

For Recency, we need the number of days since the last purchase. We'll compare the last_order_date to a reference date (usually "today" or the date of analysis).

SELECT
    customer_id,
    julian_day('2024-01-01') - julian_day(MAX(order_date)) as recency_days,
    COUNT(order_id) as frequency,
    SUM(total_amount) as monetary
FROM orders_rfm_demo
GROUP BY customer_id;

(Note: In most SQL dialects like PostgreSQL you might use CURRENT_DATE - MAX(order_date), but here we use a fixed date for reproducibility.)

Step 3: Determining RFM Scores with NTILE

This is where the magic happens. We can't just look at raw dollars because spending ranges vary wildly between businesses. Instead, we rank customers relative to each other using the logic: "Top 20% get a score of 5, bottom 20% get a score of 1".

The SQL window function NTILE(5) is perfect for this.

Recency Score: Lower days is better (5 points), higher days is worse (1 point). Wait, NTILE assigns 1 to the lowest values. So for Recency (where low is good), a low rectency_days gets NTILE 1. We want the opposite. We can order by recency_days DESC for the NTILE function so the largest days (worst) get 1 and smallest (best) get 5. Or simply order DESC.
Frequency Score: Higher is better. Order by frequency ASC.
Monetary Score: Higher is better. Order by monetary ASC.

Let's see it in action in our playground.

Interactive RFM Playground

Try modifying the queries below to see how different customers are scored.

Interactive SQL

Step 4: Creating Human-Readable Segments

Having a score like "555" or "121" is useful for machines, but humans prefer labels. We can use a CASE statement to translate these scores into named segments.

Common segment definitions:

Champions: R=4-5, F=4-5, M=4-5
Loyal Customers: F=3-5, R=3-5 (regardless of M)
Potential Loyalists: Recent customers with average frequency
Needing Attention: High R, Low F/M (About to churn)

SELECT
    customer_id,
    r_score, f_score, m_score,
    CASE
        WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
        WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customers'
        WHEN r_score >= 4 AND f_score = 1 THEN 'New Customers'
        WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
        ELSE 'Standard'
    END as customer_segment
FROM scored_data;

Best Practices for RFM in SQL

Filter Anomalies: Exclude returns (negative values) or test transactions before calculating.
Timeframe Matters: RFM is usually calculated over a sliding window (e.g., last 12 months). Older data might skew Recency.
Binning Distributions: NTILE forces equal-sized groups. If your data is skewed (e.g., 90% of customers bought once), NTILE might arbitrarily split them. In such cases, using custom CASE ranges for scores (e.g., "Frequency 1 = 1 point", "Freq 2-5 = 3 points") might be more accurate than strict percentiles.

Conclusion

RFM analysis is a low-hanging fruit in data science that delivers immediate value. By implementing it directly in SQL, you can create dynamic dashboard reports that update automatically as new orders come in. No need to export CSVs or run Python scripts—your database can handle the heavy lifting!

In this guide, we'll walk through how to calculate Recency, Frequency, and Monetary scores and segment your customer base into actionable groups like "Champions", "At Risk", and "New Customers".

What is RFM Analysis?

RFM stands for three key metrics that describe customer behavior:

Recency (R): How recently did the customer make a purchase?
Frequency (F): How often do they purchase?
Monetary (M): How much do they spend?

Step 1: Preparing the Data

To perform RFM analysis, you typically need a transaction table. Let's assume we have an orders_rfm_demo table with customer_id, order_date, and total_amount.

The first step is to aggregate this data to the customer level. We need:

Last Order Date: MAX(order_date)
Count of Orders: COUNT(order_id)
Total Spend: SUM(total_amount)

SELECT
    customer_id,
    MAX(order_date) as last_order_date,
    COUNT(order_id) as frequency,
    SUM(total_amount) as monetary
FROM orders_rfm_demo
GROUP BY customer_id;

Step 2: Calculating RFM Values

Now, let's turn these raw numbers into the "R", "F", and "M" values.

For Recency, we need the number of days since the last purchase. We'll compare the last_order_date to a reference date (usually "today" or the date of analysis).

SELECT
    customer_id,
    julian_day('2024-01-01') - julian_day(MAX(order_date)) as recency_days,
    COUNT(order_id) as frequency,
    SUM(total_amount) as monetary
FROM orders_rfm_demo
GROUP BY customer_id;

(Note: In most SQL dialects like PostgreSQL you might use CURRENT_DATE - MAX(order_date), but here we use a fixed date for reproducibility.)

Step 3: Determining RFM Scores with NTILE

The SQL window function NTILE(5) is perfect for this.

Recency Score: Lower days is better (5 points), higher days is worse (1 point). Wait, NTILE assigns 1 to the lowest values. So for Recency (where low is good), a low rectency_days gets NTILE 1. We want the opposite. We can order by recency_days DESC for the NTILE function so the largest days (worst) get 1 and smallest (best) get 5. Or simply order DESC.
Frequency Score: Higher is better. Order by frequency ASC.
Monetary Score: Higher is better. Order by monetary ASC.

Let's see it in action in our playground.

Interactive RFM Playground

Try modifying the queries below to see how different customers are scored.

Interactive SQL

Step 4: Creating Human-Readable Segments

Having a score like "555" or "121" is useful for machines, but humans prefer labels. We can use a CASE statement to translate these scores into named segments.

Common segment definitions:

Champions: R=4-5, F=4-5, M=4-5
Loyal Customers: F=3-5, R=3-5 (regardless of M)
Potential Loyalists: Recent customers with average frequency
Needing Attention: High R, Low F/M (About to churn)

SELECT
    customer_id,
    r_score, f_score, m_score,
    CASE
        WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
        WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customers'
        WHEN r_score >= 4 AND f_score = 1 THEN 'New Customers'
        WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
        ELSE 'Standard'
    END as customer_segment
FROM scored_data;

Best Practices for RFM in SQL

Filter Anomalies: Exclude returns (negative values) or test transactions before calculating.
Timeframe Matters: RFM is usually calculated over a sliding window (e.g., last 12 months). Older data might skew Recency.
Binning Distributions: NTILE forces equal-sized groups. If your data is skewed (e.g., 90% of customers bought once), NTILE might arbitrarily split them. In such cases, using custom CASE ranges for scores (e.g., "Frequency 1 = 1 point", "Freq 2-5 = 3 points") might be more accurate than strict percentiles.

Customer Segmentation with RFM Analysis in SQL

What is RFM Analysis?

Step 1: Preparing the Data

Step 2: Calculating RFM Values

Step 3: Determining RFM Scores with NTILE

Interactive RFM Playground

Step 4: Creating Human-Readable Segments

Best Practices for RFM in SQL

Conclusion

Related Articles

SQL for Data Analysis: The Ultimate Guide

Calculating Percentiles and Median in SQL

Ranking Data with SQL: RANK, DENSE_RANK, and ROW_NUMBER Explained

Comments

Customer Segmentation with RFM Analysis in SQL

What is RFM Analysis?

Step 1: Preparing the Data

Step 2: Calculating RFM Values

Step 3: Determining RFM Scores with NTILE

Interactive RFM Playground

Step 4: Creating Human-Readable Segments

Best Practices for RFM in SQL

Conclusion

Related Articles

SQL for Data Analysis: The Ultimate Guide

Calculating Percentiles and Median in SQL

Ranking Data with SQL: RANK, DENSE_RANK, and ROW_NUMBER Explained

Comments