You have a column of numbers—order amounts, page load times, user ages, test scores. You want to know: how is this data distributed? Are most values clustered in the middle? Is there a long tail? Are there suspicious gaps?
A histogram answers all of these questions. And you don't need Python, R, or a BI tool to build one. SQL can do it directly—and understanding how SQL builds histograms will make you a better data analyst.

What is a Frequency Distribution?
A frequency distribution counts how many values fall into each distinct category or range. For continuous numeric data, you first group values into buckets (also called bins), then count occurrences per bucket.
For example, instead of knowing that order amounts are "$12.50, $47.80, $13.20, $95.00…", you might want:
| Bucket | Count |
|---|---|
| $0–$25 | 42 |
| $25–$50 | 38 |
| $50–$100 | 27 |
| $100+ | 8 |
Let's build this progressively in SQL.
The Sample Dataset
Method 1: Fixed-Width Buckets with CASE
The simplest approach: define your bucket boundaries manually using a CASE expression.
The REPLACE(HEX(ZEROBLOB(n)), '00', '▪') trick generates an ASCII bar of length n—a quick way to visualize distribution directly in a query result.
Method 2: Dynamic Buckets with Math
Hard-coding bucket boundaries gets tedious at scale. A better approach: use arithmetic to compute the bucket number from the value itself.
The formula is:
bucket_num = FLOOR((value - min) / bucket_width)
This produces 10 equal-width buckets automatically regardless of the data range. Change / 10.0 to / 20.0 to get 20 buckets, or parameterize it based on your precision needs.
Note: For databases like PostgreSQL, use the built-in
WIDTH_BUCKET(value, min, max, num_buckets)function—it does exactly this in one call. In SQLite, the manual formula above is the standard approach.
Method 3: Frequency Distribution for Categorical Data
Not all frequency distributions involve numbers. Sometimes you want to count occurrences of each distinct text value—a categorical frequency distribution.
The SUM(COUNT(*)) OVER() window function calculates the grand total across all categories, enabling the percentage column without a subquery.
Adding Cumulative Frequency
A cumulative frequency column shows the running percentage—useful for questions like "what order amount covers 80% of all transactions?"
From this result you can read that ~70% of orders are under $100, and less than 10% are over $200. This is the kind of insight that shapes pricing, discount strategies, and fraud detection thresholds.
Comparing Distributions Across Segments
One of the most powerful uses of histograms is comparing two groups. Here we split the same data into "low value" vs "high value" orders and compare their bucket distributions side by side using conditional aggregation:
This side-by-side pivoted histogram is a staple of A/B test reporting and cohort analysis.
Choosing Bucket Width
The hardest part of building a histogram isn't the SQL—it's choosing how wide to make your buckets.
Sturges' rule is a classic starting point: use k = ceil(log₂ n) + 1 buckets for n observations. For 100 rows: ceil(log₂ 100) + 1 = 7 + 1 = 8 buckets.
Scott's rule bases bucket width on the standard deviation: h = 3.5 × σ / n^(1/3). This adapts to spread in the data.
In practice:
- Too few buckets → hiding variation (one giant bucket that says "most orders are $0–$200").
- Too many buckets → noise (every bucket has 1–2 rows and the pattern is invisible).
- A good rule of thumb: aim for 5–15 buckets for exploratory analysis.
Practical Tips
- Round bucket boundaries to human-friendly numbers ($25 increments, not $21.7 increments) for readability.
- Always include a
$X+overflow bucket so no rows are silently dropped. - Check for gaps: if a bucket has 0 rows,
GROUP BYwon't show it. Use a numbers table or recursive CTE to fill in zero-count buckets if continuity matters. - Log-scale buckets work better for highly skewed data (e.g., revenue where most customers spend < $50 but a few spend $10,000): use doublings ($0–$10, $10–$100, $100–$1000).
Conclusion
Building histograms in SQL requires nothing more than CASE expressions for bucketing, COUNT(*) for frequency, and window functions for percentages. These techniques scale from exploratory one-off analysis to production reporting queries that run every day.
The next time you're asked "how is this data distributed?"—skip the notebook. Open a SQL editor and build the histogram right where the data lives.