The Pareto Principle states that for many outcomes, roughly 80% of consequences come from 20% of the causes. In business, this often holds true: 80% of sales come from 20% of clients, or 80% of bugs come from 20% of the code.
Identifying this "Vital Few" allows you to focus your energy where it matters most. In this tutorial, we'll write a SQL query to categorize products into the top 20% vs the bottom 80%.
The Strategy
- Calculate Total Sales per item.
- Order items from highest sales to lowest.
- Calculate Running Total of sales.
- Calculate Cumulative Percentage:
Running Total / Grand Total. - Filter: Items where cumulative percentage
<=80% are your core revenue drivers.
Step 1: The Running Total
We use the sum window function:
SUM(amount) OVER (ORDER BY amount DESC)
Step 2: The Grand Total (Fraction)
To get the percentage, we can divide the running sum by the total sum:
SUM(amount) OVER (ORDER BY amount DESC) * 1.0 / SUM(amount) OVER ()
Note the empty OVER () clause, which effectively means "sum of the whole dataset".
Interactive Playground
Let's look at sales for an electronics store. We'll identify which products make up the first 80% of revenue.
Analyzing the Result
In the example above, you should see that just the Laptops and Smartphones account for the vast majority of revenue. The filter logic cumulative_pct <= 80 helps us programmatically isolate these key items for reports.
Conclusion
Calculating cumulative distributions is a superpower for analysts. It moves you beyond "Top 10" lists (which are arbitrary) to statistically significant segments like "The Revenue Drivers" vs "The Long Tail".