In the world of data analytics, understanding your customers is key to driving growth. One of the most effective and time-tested methods for customer segmentation is RFM Analysis. But you don't need expensive CRM software to do it—you can build a powerful RFM model right in your database using standard SQL.
In this guide, we'll walk through how to calculate Recency, Frequency, and Monetary scores and segment your customer base into actionable groups like "Champions", "At Risk", and "New Customers".
What is RFM Analysis?
RFM stands for three key metrics that describe customer behavior:
- Recency (R): How recently did the customer make a purchase?
- Frequency (F): How often do they purchase?
- Monetary (M): How much do they spend?
By scoring customers on these three dimensions (usually from 1 to 5), you can group them into segments. For example, a customer with high scores in all three categories is a "Champion," while someone with high Monetary value but low Recency might be "At Risk" of churning.
Step 1: Preparing the Data
To perform RFM analysis, you typically need a transaction table. Let's assume we have an orders_rfm_demo table with customer_id, order_date, and total_amount.
The first step is to aggregate this data to the customer level. We need:
- Last Order Date:
MAX(order_date) - Count of Orders:
COUNT(order_id) - Total Spend:
SUM(total_amount)
SELECT
customer_id,
MAX(order_date) as last_order_date,
COUNT(order_id) as frequency,
SUM(total_amount) as monetary
FROM orders_rfm_demo
GROUP BY customer_id;
Step 2: Calculating RFM Values
Now, let's turn these raw numbers into the "R", "F", and "M" values.
- For Recency, we need the number of days since the last purchase. We'll compare the
last_order_dateto a reference date (usually "today" or the date of analysis).
SELECT
customer_id,
julian_day('2024-01-01') - julian_day(MAX(order_date)) as recency_days,
COUNT(order_id) as frequency,
SUM(total_amount) as monetary
FROM orders_rfm_demo
GROUP BY customer_id;
(Note: In most SQL dialects like PostgreSQL you might use CURRENT_DATE - MAX(order_date), but here we use a fixed date for reproducibility.)
Step 3: Determining RFM Scores with NTILE
This is where the magic happens. We can't just look at raw dollars because spending ranges vary wildly between businesses. Instead, we rank customers relative to each other using the logic: "Top 20% get a score of 5, bottom 20% get a score of 1".
The SQL window function NTILE(5) is perfect for this.
- Recency Score: Lower days is better (5 points), higher days is worse (1 point). Wait,
NTILEassigns 1 to the lowest values. So for Recency (where low is good), a lowrectency_daysgetsNTILE1. We want the opposite. We can order byrecency_days DESCfor theNTILEfunction so the largest days (worst) get 1 and smallest (best) get 5. Or simply orderDESC. - Frequency Score: Higher is better. Order by
frequency ASC. - Monetary Score: Higher is better. Order by
monetary ASC.
Let's see it in action in our playground.
Interactive RFM Playground
Try modifying the queries below to see how different customers are scored.
Step 4: Creating Human-Readable Segments
Having a score like "555" or "121" is useful for machines, but humans prefer labels. We can use a CASE statement to translate these scores into named segments.
Common segment definitions:
- Champions: R=4-5, F=4-5, M=4-5
- Loyal Customers: F=3-5, R=3-5 (regardless of M)
- Potential Loyalists: Recent customers with average frequency
- Needing Attention: High R, Low F/M (About to churn)
SELECT
customer_id,
r_score, f_score, m_score,
CASE
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customers'
WHEN r_score >= 4 AND f_score = 1 THEN 'New Customers'
WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
ELSE 'Standard'
END as customer_segment
FROM scored_data;
Best Practices for RFM in SQL
- Filter Anomalies: Exclude returns (negative values) or test transactions before calculating.
- Timeframe Matters: RFM is usually calculated over a sliding window (e.g., last 12 months). Older data might skew Recency.
- Binning Distributions:
NTILEforces equal-sized groups. If your data is skewed (e.g., 90% of customers bought once),NTILEmight arbitrarily split them. In such cases, using customCASEranges for scores (e.g., "Frequency 1 = 1 point", "Freq 2-5 = 3 points") might be more accurate than strict percentiles.
Conclusion
RFM analysis is a low-hanging fruit in data science that delivers immediate value. By implementing it directly in SQL, you can create dynamic dashboard reports that update automatically as new orders come in. No need to export CSVs or run Python scripts—your database can handle the heavy lifting!