Most SQL beginners know how to calculate a total with SUM or an average with AVG. But if you're working in data analysis, finance, or data science, basic averages are rarely enough. Averages can be heavily distorted by outliers, and they don't tell you anything about the distribution of your data.
To truly understand your data, you need Descriptive Statistics. In this guide, we'll go beyond the basics and learn how to calculate median, mode, percentiles, and variance using standard SQL.
1. The Median: The True Middle
The average (mean) is sensitive to extreme values. The Median is the middle value when data is sorted. It's much more robust for things like housing prices or employee salaries.
While some databases have a built-in MEDIAN() function, others require a bit more work.
-- Using percentile_cont (PostgreSQL, SQL Server, etc.)
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees;
In SQLite (which we use for our playground), we can calculate it by sorting and picking the middle row:
Insight: Notice how the average salary is $239,000, but the median is $50,000. The median gives a much better sense of what a "typical" employee earns.
2. The Mode: The Most Frequent Value
The Mode is the value that appears most often in your dataset. This is useful for answering questions like "What is our best-selling product category?" or "Where do most of our users live?"
SELECT category, COUNT(*) as frequency
FROM sales
GROUP BY category
ORDER BY frequency DESC
LIMIT 1;
3. Percentiles: Understanding Distribution
Percentiles (like the 90th or 95th percentile) are essential for performance monitoring (P95 latency) and grading. They tell you the value below which a given percentage of data falls.
4. Range and Spread
The simplest measure of spread is the Range—the difference between the maximum and minimum values. While simple, it helps you identify the boundaries of your dataset.
SELECT
MIN(price) as min_price,
MAX(price) as max_price,
MAX(price) - MIN(price) as price_range
FROM products;
5. Standard Deviation and Variance (The Advanced Stuff)
Standard Deviation measures how spread out your numbers are from the average. A low standard deviation means the data is clustered closely around the mean, while a high one means it's widely dispersed.
Most enterprise databases (Postgres, Oracle, SQL Server) provide:
STDDEV()orSTDEV()VARIANCE()orVAR()
Summary Table
| Statistic | What it tells you | SQL Keyword/Pattern |
|---|---|---|
| Mean | The arithmetic average | AVG() |
| Median | The "middle" value | PERCENTILE_CONT(0.5) |
| Mode | The most common value | GROUP BY + ORDER BY + LIMIT 1 |
| Range | The spread (Max - Min) | MAX() - MIN() |
| Std Dev | How dispersed data is | STDDEV() |
Conclusion
Mastering statistics in SQL allows you to perform complex data analysis directly where the data lives. Instead of exporting large CSVs to Excel or Python, you can generate powerful insights with a few lines of SQL.
Next time you're asked for a "summary report," don't just provide the average. Include the median and the range to tell the full story of your data.
Try it out!
Use the playground snippet above to add more "outlier" salaries and see how the average moves while the median stays stable!