Building Histograms and Frequency Distributions in SQL

You have a column of numbers—order amounts, page load times, user ages, test scores. You want to know: how is this data distributed? Are most values clustered in the middle? Is there a long tail? Are there suspicious gaps?

A histogram answers all of these questions. And you don't need Python, R, or a BI tool to build one. SQL can do it directly—and understanding how SQL builds histograms will make you a better data analyst.

A colorful hand-drawn bar chart showing a frequency distribution histogram with bucket labels on the x-axis and counts on the y-axis

What is a Frequency Distribution?

A frequency distribution counts how many values fall into each distinct category or range. For continuous numeric data, you first group values into buckets (also called bins), then count occurrences per bucket.

For example, instead of knowing that order amounts are "$12.50, $47.80, $13.20, $95.00…", you might want:

Bucket	Count
$0–$25	42
$25–$50	38
$50–$100	27
$100+	8

Let's build this progressively in SQL.

The Sample Dataset

Interactive SQL

Method 1: Fixed-Width Buckets with CASE

The simplest approach: define your bucket boundaries manually using a CASE expression.

Interactive SQL

The REPLACE(HEX(ZEROBLOB(n)), '00', '▪') trick generates an ASCII bar of length n—a quick way to visualize distribution directly in a query result.

Method 2: Dynamic Buckets with Math

Hard-coding bucket boundaries gets tedious at scale. A better approach: use arithmetic to compute the bucket number from the value itself.

The formula is:

bucket_num = FLOOR((value - min) / bucket_width)

Interactive SQL

This produces 10 equal-width buckets automatically regardless of the data range. Change / 10.0 to / 20.0 to get 20 buckets, or parameterize it based on your precision needs.

Note: For databases like PostgreSQL, use the built-in WIDTH_BUCKET(value, min, max, num_buckets) function—it does exactly this in one call. In SQLite, the manual formula above is the standard approach.

Method 3: Frequency Distribution for Categorical Data

Not all frequency distributions involve numbers. Sometimes you want to count occurrences of each distinct text value—a categorical frequency distribution.

Interactive SQL

The SUM(COUNT(*)) OVER() window function calculates the grand total across all categories, enabling the percentage column without a subquery.

Adding Cumulative Frequency

A cumulative frequency column shows the running percentage—useful for questions like "what order amount covers 80% of all transactions?"

Interactive SQL

From this result you can read that ~70% of orders are under $100, and less than 10% are over $200. This is the kind of insight that shapes pricing, discount strategies, and fraud detection thresholds.

Comparing Distributions Across Segments

One of the most powerful uses of histograms is comparing two groups. Here we split the same data into "low value" vs "high value" orders and compare their bucket distributions side by side using conditional aggregation:

Interactive SQL

This side-by-side pivoted histogram is a staple of A/B test reporting and cohort analysis.

Choosing Bucket Width

The hardest part of building a histogram isn't the SQL—it's choosing how wide to make your buckets.

Sturges' rule is a classic starting point: use k = ceil(log₂ n) + 1 buckets for n observations. For 100 rows: ceil(log₂ 100) + 1 = 7 + 1 = 8 buckets.

Scott's rule bases bucket width on the standard deviation: h = 3.5 × σ / n^(1/3). This adapts to spread in the data.

In practice:

Too few buckets → hiding variation (one giant bucket that says "most orders are $0–$200").
Too many buckets → noise (every bucket has 1–2 rows and the pattern is invisible).
A good rule of thumb: aim for 5–15 buckets for exploratory analysis.

Practical Tips

Round bucket boundaries to human-friendly numbers ($25 increments, not $21.7 increments) for readability.
Always include a $X+ overflow bucket so no rows are silently dropped.
Check for gaps: if a bucket has 0 rows, GROUP BY won't show it. Use a numbers table or recursive CTE to fill in zero-count buckets if continuity matters.
Log-scale buckets work better for highly skewed data (e.g., revenue where most customers spend < $50 but a few spend $10,000): use doublings ($0–$10, $10–$100, $100–$1000).

Conclusion

Building histograms in SQL requires nothing more than CASE expressions for bucketing, COUNT(*) for frequency, and window functions for percentages. These techniques scale from exploratory one-off analysis to production reporting queries that run every day.

The next time you're asked "how is this data distributed?"—skip the notebook. Open a SQL editor and build the histogram right where the data lives.

What is a Frequency Distribution?

For example, instead of knowing that order amounts are "$12.50, $47.80, $13.20, $95.00…", you might want:

Bucket	Count
$0–$25	42
$25–$50	38
$50–$100	27
$100+	8

Let's build this progressively in SQL.

The Sample Dataset

Interactive SQL

Method 1: Fixed-Width Buckets with CASE

The simplest approach: define your bucket boundaries manually using a CASE expression.

Interactive SQL

The REPLACE(HEX(ZEROBLOB(n)), '00', '▪') trick generates an ASCII bar of length n—a quick way to visualize distribution directly in a query result.

Method 2: Dynamic Buckets with Math

Hard-coding bucket boundaries gets tedious at scale. A better approach: use arithmetic to compute the bucket number from the value itself.

The formula is:

bucket_num = FLOOR((value - min) / bucket_width)

Interactive SQL

This produces 10 equal-width buckets automatically regardless of the data range. Change / 10.0 to / 20.0 to get 20 buckets, or parameterize it based on your precision needs.

Note: For databases like PostgreSQL, use the built-in WIDTH_BUCKET(value, min, max, num_buckets) function—it does exactly this in one call. In SQLite, the manual formula above is the standard approach.

Method 3: Frequency Distribution for Categorical Data

Not all frequency distributions involve numbers. Sometimes you want to count occurrences of each distinct text value—a categorical frequency distribution.

Interactive SQL

The SUM(COUNT(*)) OVER() window function calculates the grand total across all categories, enabling the percentage column without a subquery.

Adding Cumulative Frequency

A cumulative frequency column shows the running percentage—useful for questions like "what order amount covers 80% of all transactions?"

Interactive SQL

Comparing Distributions Across Segments

Interactive SQL

This side-by-side pivoted histogram is a staple of A/B test reporting and cohort analysis.

Choosing Bucket Width

The hardest part of building a histogram isn't the SQL—it's choosing how wide to make your buckets.

Sturges' rule is a classic starting point: use k = ceil(log₂ n) + 1 buckets for n observations. For 100 rows: ceil(log₂ 100) + 1 = 7 + 1 = 8 buckets.

Scott's rule bases bucket width on the standard deviation: h = 3.5 × σ / n^(1/3). This adapts to spread in the data.

In practice:

Too few buckets → hiding variation (one giant bucket that says "most orders are $0–$200").
Too many buckets → noise (every bucket has 1–2 rows and the pattern is invisible).
A good rule of thumb: aim for 5–15 buckets for exploratory analysis.

Practical Tips

Round bucket boundaries to human-friendly numbers ($25 increments, not $21.7 increments) for readability.
Always include a $X+ overflow bucket so no rows are silently dropped.
Check for gaps: if a bucket has 0 rows, GROUP BY won't show it. Use a numbers table or recursive CTE to fill in zero-count buckets if continuity matters.
Log-scale buckets work better for highly skewed data (e.g., revenue where most customers spend < $50 but a few spend $10,000): use doublings ($0–$10, $10–$100, $100–$1000).

Conclusion

The next time you're asked "how is this data distributed?"—skip the notebook. Open a SQL editor and build the histogram right where the data lives.

Building Histograms and Frequency Distributions in SQL

What is a Frequency Distribution?

The Sample Dataset

Method 1: Fixed-Width Buckets with CASE

Method 2: Dynamic Buckets with Math

Method 3: Frequency Distribution for Categorical Data

Adding Cumulative Frequency

Comparing Distributions Across Segments

Choosing Bucket Width

Practical Tips

Conclusion

Related Articles

Calculating Percentiles and Median in SQL

SQL for Anomaly Detection: Finding Outliers

SQL for Data Analysis: The Ultimate Guide

Comments

Building Histograms and Frequency Distributions in SQL

What is a Frequency Distribution?

The Sample Dataset

Method 1: Fixed-Width Buckets with CASE

Method 2: Dynamic Buckets with Math

Method 3: Frequency Distribution for Categorical Data

Adding Cumulative Frequency

Comparing Distributions Across Segments

Choosing Bucket Width

Practical Tips

Conclusion

Related Articles

Calculating Percentiles and Median in SQL

SQL for Anomaly Detection: Finding Outliers

SQL for Data Analysis: The Ultimate Guide

Comments