Mastering SQL GROUP BY: From Basics to Advanced Aggregations
The GROUP BY clause is one of SQL's most powerful features for data analysis. It transforms raw rows into meaningful summaries by grouping related data and calculating aggregates like totals, averages, and counts.
If you've ever needed to answer questions like "How many orders per customer?" or "What's the average salary by department?", you've needed GROUP BY.
How GROUP BY Works
Think of GROUP BY as organizing a messy pile of papers into labeled folders. All papers (rows) with the same label go into the same folder (group), then you can count how many papers are in each folder or calculate statistics about them.
The process:
- Scan all rows in the table
- Group rows that have the same value in the GROUP BY column(s)
- Aggregate each group using functions like COUNT, SUM, AVG, MAX, MIN
- Return one row per group
Basic GROUP BY with COUNT
Let's start with the most common use case: counting items in each group.
Visualizing GROUP BY with COUNT:
Visualizing GROUP BY department with COUNT(*)
Try it yourself:
Key Insight: Notice how 6 rows become 2 rows—one per department. COUNT(*) counts all rows in each group.
GROUP BY with SUM
Now let's calculate totals for each group.
Visualizing GROUP BY with SUM:
Visualizing GROUP BY department with SUM(salary)
Pro Tip: You can use multiple aggregate functions in the same query! Combine COUNT, SUM, and AVG to get a comprehensive summary.
The HAVING Clause: Filtering Groups
WHERE filters rows before grouping. HAVING filters groups after aggregation.
Example: Find departments with more than 2 employees
Result: Both departments appear because they each have 3 employees. If we changed the condition to > 3, neither would appear.
Grouping by Multiple Columns
You can group by more than one column to create more granular groups.
This creates a cross-tabulation: each unique combination of department and salary band gets its own group.
Common Aggregate Functions
| Function | Purpose | Example |
|---|---|---|
COUNT(*) | Count all rows in group | Total orders per customer |
COUNT(column) | Count non-NULL values | Customers with email addresses |
SUM(column) | Add up values | Total revenue per product |
AVG(column) | Calculate average | Average rating per movie |
MAX(column) | Find maximum | Highest salary per department |
MIN(column) | Find minimum | Earliest order date per customer |
Common Mistakes to Avoid
❌ Selecting non-grouped columns
-- WRONG: name is not in GROUP BY
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;
Rule: Every column in SELECT must either be:
- In the GROUP BY clause, OR
- Inside an aggregate function
✅ Correct version
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department;
Practice Challenges
Try modifying the queries above to:
- Find the department with the highest average salary
- Count how many employees earn more than $60,000 in each department
- List departments where the total salary budget exceeds $150,000
Key Takeaways
GROUP BYcollapses rows into groups based on column values- Aggregate functions (COUNT, SUM, AVG, MAX, MIN) calculate one value per group
HAVINGfilters groups after aggregation (useWHEREto filter rows before)- Every SELECT column must be grouped or aggregated
- You can group by multiple columns for finer granularity
Master GROUP BY and you'll unlock powerful data analysis capabilities in SQL!