Descriptive Statistics in SQL: Beyond Average and Count

Most SQL beginners know how to calculate a total with SUM or an average with AVG. But if you're working in data analysis, finance, or data science, basic averages are rarely enough. Averages can be heavily distorted by outliers, and they don't tell you anything about the distribution of your data.

To truly understand your data, you need Descriptive Statistics. In this guide, we'll go beyond the basics and learn how to calculate median, mode, percentiles, and variance using standard SQL.

Descriptive statistics dashboard with mean, median, mode, and std dev

1. The Median: The True Middle

The average (mean) is sensitive to extreme values. The Median is the middle value when data is sorted. It's much more robust for things like housing prices or employee salaries.

While some databases have a built-in MEDIAN() function, others require a bit more work.

-- Using percentile_cont (PostgreSQL, SQL Server, etc.)
SELECT 
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees;

In SQLite (which we use for our playground), we can calculate it by sorting and picking the middle row:

Interactive SQL

Insight: Notice how the average salary is $239,000, but the median is $50,000. The median gives a much better sense of what a "typical" employee earns.

2. The Mode: The Most Frequent Value

The Mode is the value that appears most often in your dataset. This is useful for answering questions like "What is our best-selling product category?" or "Where do most of our users live?"

SELECT category, COUNT(*) as frequency
FROM sales
GROUP BY category
ORDER BY frequency DESC
LIMIT 1;

3. Percentiles: Understanding Distribution

Percentiles (like the 90th or 95th percentile) are essential for performance monitoring (P95 latency) and grading. They tell you the value below which a given percentage of data falls.

Interactive SQL

4. Range and Spread

The simplest measure of spread is the Range—the difference between the maximum and minimum values. While simple, it helps you identify the boundaries of your dataset.

SELECT 
  MIN(price) as min_price,
  MAX(price) as max_price,
  MAX(price) - MIN(price) as price_range
FROM products;

5. Standard Deviation and Variance (The Advanced Stuff)

Standard Deviation measures how spread out your numbers are from the average. A low standard deviation means the data is clustered closely around the mean, while a high one means it's widely dispersed.

Most enterprise databases (Postgres, Oracle, SQL Server) provide:

STDDEV() or STDEV()
VARIANCE() or VAR()

Summary Table

Statistic	What it tells you	SQL Keyword/Pattern
Mean	The arithmetic average	`AVG()`
Median	The "middle" value	`PERCENTILE_CONT(0.5)`
Mode	The most common value	`GROUP BY` + `ORDER BY` + `LIMIT 1`
Range	The spread (Max - Min)	`MAX()` - `MIN()`
Std Dev	How dispersed data is	`STDDEV()`

Conclusion

Mastering statistics in SQL allows you to perform complex data analysis directly where the data lives. Instead of exporting large CSVs to Excel or Python, you can generate powerful insights with a few lines of SQL.

Next time you're asked for a "summary report," don't just provide the average. Include the median and the range to tell the full story of your data.

Try it out!

Use the playground snippet above to add more "outlier" salaries and see how the average moves while the median stays stable!

To truly understand your data, you need Descriptive Statistics. In this guide, we'll go beyond the basics and learn how to calculate median, mode, percentiles, and variance using standard SQL.

1. The Median: The True Middle

The average (mean) is sensitive to extreme values. The Median is the middle value when data is sorted. It's much more robust for things like housing prices or employee salaries.

While some databases have a built-in MEDIAN() function, others require a bit more work.

-- Using percentile_cont (PostgreSQL, SQL Server, etc.)
SELECT 
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees;

In SQLite (which we use for our playground), we can calculate it by sorting and picking the middle row:

Interactive SQL

Insight: Notice how the average salary is $239,000, but the median is $50,000. The median gives a much better sense of what a "typical" employee earns.

2. The Mode: The Most Frequent Value

The Mode is the value that appears most often in your dataset. This is useful for answering questions like "What is our best-selling product category?" or "Where do most of our users live?"

SELECT category, COUNT(*) as frequency
FROM sales
GROUP BY category
ORDER BY frequency DESC
LIMIT 1;

3. Percentiles: Understanding Distribution

Percentiles (like the 90th or 95th percentile) are essential for performance monitoring (P95 latency) and grading. They tell you the value below which a given percentage of data falls.

Interactive SQL

4. Range and Spread

The simplest measure of spread is the Range—the difference between the maximum and minimum values. While simple, it helps you identify the boundaries of your dataset.

SELECT 
  MIN(price) as min_price,
  MAX(price) as max_price,
  MAX(price) - MIN(price) as price_range
FROM products;

5. Standard Deviation and Variance (The Advanced Stuff)

Most enterprise databases (Postgres, Oracle, SQL Server) provide:

STDDEV() or STDEV()
VARIANCE() or VAR()

Summary Table

Statistic	What it tells you	SQL Keyword/Pattern
Mean	The arithmetic average	`AVG()`
Median	The "middle" value	`PERCENTILE_CONT(0.5)`
Mode	The most common value	`GROUP BY` + `ORDER BY` + `LIMIT 1`
Range	The spread (Max - Min)	`MAX()` - `MIN()`
Std Dev	How dispersed data is	`STDDEV()`

Conclusion

Next time you're asked for a "summary report," don't just provide the average. Include the median and the range to tell the full story of your data.

Try it out!

Use the playground snippet above to add more "outlier" salaries and see how the average moves while the median stays stable!

Descriptive Statistics in SQL: Beyond Average and Count

1. The Median: The True Middle

2. The Mode: The Most Frequent Value

3. Percentiles: Understanding Distribution

4. Range and Spread

5. Standard Deviation and Variance (The Advanced Stuff)

Summary Table

Conclusion

Try it out!

Related Articles

Calculating Percentiles and Median in SQL

Solving the Gaps and Islands Problem in SQL

SQL for Data Analysis: The Ultimate Guide

Comments

Descriptive Statistics in SQL: Beyond Average and Count

1. The Median: The True Middle

2. The Mode: The Most Frequent Value

3. Percentiles: Understanding Distribution

4. Range and Spread

5. Standard Deviation and Variance (The Advanced Stuff)

Summary Table

Conclusion

Try it out!

Related Articles

Calculating Percentiles and Median in SQL

Solving the Gaps and Islands Problem in SQL

SQL for Data Analysis: The Ultimate Guide

Comments