SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed
SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog
Back to Blog
2025-12-08
6 min read

Cohort Analysis with SQL

analyticscohort-analysisdata-analysissql

You've launched a new app. Users are signing up every day. But here's the million-dollar question: Are they coming back?

A single metric like "total active users" won't tell you the full story. You need to know: Are users who signed up in January still active in March? Are they more engaged than users who signed up in February?

This is where cohort analysis comes in—one of the most powerful techniques in product analytics. And the best part? You can do it entirely with SQL.

What is Cohort Analysis?

A cohort is a group of users who share a common characteristic during a specific time period. For example:

  • All users who signed up in January 2024.
  • All users who made their first purchase during Black Friday.

Cohort analysis tracks these groups over time to see how their behavior changes. The most common use case is retention analysis: measuring what percentage of users return after their first interaction.

Why Cohort Analysis Matters

Imagine you're running a SaaS product. You notice:

  • January cohort: 60% of users are still active after 3 months.
  • February cohort: Only 30% are still active after 3 months.

This tells you something changed between January and February. Maybe you released a buggy feature, or your onboarding flow broke. Without cohort analysis, you'd never catch this.

The Data Model

Let's use a simple e-commerce example. We have a user_activity table that tracks when users visit the site:

user_idactivity_date
12024-01-05
12024-01-12
12024-02-03
22024-01-10
22024-01-15

We want to answer: What percentage of users who joined in January came back in February?

Step 1: Identify the Cohort

First, we need to determine each user's cohort—the month they first became active.

SELECT 
  user_id,
  DATE_TRUNC('month', MIN(activity_date)) AS cohort_month
FROM user_activity
GROUP BY user_id;

This gives us:

user_idcohort_month
12024-01-01
22024-01-01

Both users belong to the January 2024 cohort.

Step 2: Calculate Activity Periods

Next, for each activity, we calculate how many months have passed since the user's cohort month. This is called the period number.

WITH cohorts AS (
  SELECT 
    user_id,
    DATE_TRUNC('month', MIN(activity_date)) AS cohort_month
  FROM user_activity
  GROUP BY user_id
)
SELECT 
  ua.user_id,
  c.cohort_month,
  DATE_TRUNC('month', ua.activity_date) AS activity_month,
  (EXTRACT(YEAR FROM ua.activity_date) - EXTRACT(YEAR FROM c.cohort_month)) * 12 
    + (EXTRACT(MONTH FROM ua.activity_date) - EXTRACT(MONTH FROM c.cohort_month)) AS period_number
FROM user_activity ua
JOIN cohorts c ON ua.user_id = c.user_id;

This gives us:

user_idcohort_monthactivity_monthperiod_number
12024-01-012024-01-010
12024-01-012024-01-010
12024-01-012024-02-011
22024-01-012024-01-010

Period 0 = cohort month, Period 1 = 1 month later, etc.

Step 3: Build the Retention Matrix

Now we count how many users from each cohort were active in each period.

Interactive SQL
Loading...

Reading the Results:

  • Cohort Month: When users first became active.
  • Period Number: Months since cohort month (0 = same month, 1 = next month, etc.).
  • Active Users: How many users from that cohort were active in that period.

For example:

  • 2024-01-01, Period 0: 3 users (all January signups were active in January).
  • 2024-01-01, Period 1: 2 users (2 out of 3 January signups came back in February).
  • 2024-01-01, Period 2: 2 users (2 out of 3 came back in March).

Retention Rate = (Active Users in Period N) / (Active Users in Period 0) × 100%

So for the January cohort:

  • Month 1 Retention: 2/3 = 67%
  • Month 2 Retention: 2/3 = 67%

Visualizing Cohort Data

In practice, you'd export this data to a tool like Excel, Google Sheets, or a BI platform to create a retention heatmap:

Month 0Month 1Month 2
Jan 2024100%67%67%
Feb 2024100%100%100%

Green = high retention, Red = low retention.

Advanced: Calculating Cohort Metrics

Once you have the cohort structure, you can calculate other metrics:

Average Revenue Per Cohort

SELECT 
  c.cohort_month,
  AVG(o.order_total) AS avg_revenue
FROM orders o
JOIN cohorts c ON o.user_id = c.user_id
GROUP BY c.cohort_month;

Churn Rate

-- Users who were active in Period 0 but NOT in Period 1
SELECT 
  cohort_month,
  COUNT(DISTINCT CASE WHEN period_number = 0 THEN user_id END) AS period_0_users,
  COUNT(DISTINCT CASE WHEN period_number = 1 THEN user_id END) AS period_1_users,
  1.0 - (COUNT(DISTINCT CASE WHEN period_number = 1 THEN user_id END) * 1.0 / 
         COUNT(DISTINCT CASE WHEN period_number = 0 THEN user_id END)) AS churn_rate
FROM user_periods
GROUP BY cohort_month;

Best Practices

  1. Choose the Right Cohort Window: Monthly cohorts are common, but you can also use weekly or daily cohorts for fast-moving products.

  2. Define "Active" Clearly: Is a user "active" if they log in? Make a purchase? Engage with content? Be consistent.

  3. Use CTEs for Readability: Cohort queries can get complex. Break them into steps with Common Table Expressions.

  4. Automate with Views: Create a view for your cohort logic so analysts can query it easily.

Conclusion

Cohort analysis is essential for understanding user behavior over time. By grouping users based on when they joined and tracking their activity, you can:

  • Measure retention and identify drop-off points.
  • Compare cohorts to see the impact of product changes.
  • Predict churn and take action before users leave.

And with SQL, you have full control over the analysis—no need to wait for a third-party tool to add the feature you need.

Now go build some cohorts! 📊

Share this article:

Comments

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed