Ever run a query and gotten back way more rows than expected, with the same values repeating over and over? Or needed to find out exactly how many unique customers placed orders last month? That's where SQL DISTINCT comes to the rescue.
In this comprehensive guide, we'll explore everything you need to know about DISTINCT—from basic usage to advanced techniques, performance considerations, and common mistakes to avoid.
What is SQL DISTINCT?
The DISTINCT keyword eliminates duplicate rows from your query results, returning only unique values. It's one of the most commonly used SQL features for data analysis and reporting.
Here's the basic syntax:
SELECT DISTINCT column1, column2
FROM table_name;
When you use DISTINCT, SQL compares all the columns in your SELECT list and returns only rows where the entire combination of values is unique.
Basic DISTINCT Usage
Let's start with a simple example. Imagine you have an orders table and want to know which cities have placed orders:
Without DISTINCT, you'd get "New York" three times, "Los Angeles" twice, and "Chicago" twice. With DISTINCT, you get each city exactly once.
Try removing DISTINCT from the query above to see the difference!
DISTINCT with Multiple Columns
When you use DISTINCT with multiple columns, SQL looks for unique combinations of all the columns together:
Notice that even though "Alice Johnson" from "New York" appears twice in the orders table (orders 1 and 4), DISTINCT returns it only once because the combination of city and customer_name is the same.
DISTINCT vs GROUP BY
A common question: when should you use DISTINCT versus GROUP BY? Here's the key difference:
| Feature | DISTINCT | GROUP BY |
|---|---|---|
| Purpose | Remove duplicates | Group rows for aggregation |
| Aggregation | Cannot use aggregate functions | Can use COUNT, SUM, AVG, etc. |
| Performance | Generally faster for simple deduplication | Better for complex aggregations |
| Use Case | Get unique values | Calculate statistics per group |
-- DISTINCT: Just get unique cities
SELECT DISTINCT city
FROM orders_distinct_basic;
-- GROUP BY: Get cities with order counts
SELECT city, COUNT(*) as order_count
FROM orders_distinct_basic
GROUP BY city;
Rule of thumb: If you need aggregation (COUNT, SUM, AVG, etc.), use GROUP BY. If you just need unique values, use DISTINCT.
Counting Unique Values
One of the most powerful uses of DISTINCT is counting unique values using COUNT(DISTINCT column):
This is incredibly useful for analytics—you can see total orders (6), unique customers (4), and unique cities (3) all in one query!
DISTINCT with NULL Values
Here's something important to know: DISTINCT treats NULL as a unique value. If you have multiple NULL values in a column, DISTINCT will return only one NULL:
Notice that even though two customers have NULL emails, DISTINCT returns only one NULL in the results.
Common Use Cases for DISTINCT
Let's explore some real-world scenarios where DISTINCT shines:
1. Finding Unique Product Categories
-- Get all product categories in inventory
SELECT DISTINCT category
FROM products
WHERE stock_quantity > 0
ORDER BY category;
2. Identifying Active Users
-- Count unique users who logged in this month
SELECT COUNT(DISTINCT user_id) as active_users
FROM login_logs
WHERE login_date >= '2025-12-01';
3. Deduplicating Email Lists
-- Get unique email addresses for marketing campaign
SELECT DISTINCT email
FROM customers
WHERE email IS NOT NULL
AND subscribed = true;
4. Finding Distinct Combinations
-- Get all unique product-supplier combinations
SELECT DISTINCT product_id, supplier_id
FROM inventory
ORDER BY product_id, supplier_id;
Performance Considerations
While DISTINCT is powerful, it can impact query performance. Here's what you need to know:
DISTINCT Requires Sorting or Hashing
To find duplicates, the database must either:
- Sort the result set and compare adjacent rows
- Hash the values and check for duplicates
Both operations consume memory and CPU, especially with large datasets.
Performance Tips
1. Use DISTINCT on Indexed Columns
-- Faster: city is indexed
SELECT DISTINCT city
FROM orders;
-- Slower: combination of non-indexed columns
SELECT DISTINCT customer_name, phone, address
FROM orders;
2. Filter Before DISTINCT
-- Good: Filter first, then deduplicate
SELECT DISTINCT city
FROM orders
WHERE order_date >= '2025-01-01';
-- Less efficient: Deduplicate everything, then filter
-- (This example won't work as intended)
3. Consider EXISTS Instead of DISTINCT with Joins
-- Using DISTINCT (can be slow)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
-- Using EXISTS (often faster)
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
4. Use COUNT(DISTINCT) Wisely
-- Can be slow on large tables
SELECT COUNT(DISTINCT customer_id)
FROM orders;
-- Consider approximate counts for very large datasets
-- (database-specific functions like HyperLogLog)
Common Mistakes to Avoid
1. Using DISTINCT to Hide Data Quality Issues
-- ❌ Bad: Using DISTINCT to hide duplicate data
SELECT DISTINCT customer_id, name, email
FROM customers;
-- ✅ Good: Fix the root cause
-- Investigate why there are duplicates and clean the data
If you're using DISTINCT to fix duplicates, you might have a data quality problem that needs addressing at the source.
2. DISTINCT on All Columns When You Need Specific Ones
-- ❌ Inefficient: DISTINCT on all columns
SELECT DISTINCT customer_id, name, email, phone, address, city, state, zip
FROM customers;
-- ✅ Better: Only select what you need
SELECT DISTINCT email
FROM customers;
3. Forgetting DISTINCT Applies to the Entire Row
-- This returns unique combinations of name AND city
SELECT DISTINCT name, city
FROM customers;
-- If you want unique names only, do this:
SELECT DISTINCT name
FROM customers;
4. Using DISTINCT with Aggregate Functions Incorrectly
-- ❌ Wrong: DISTINCT doesn't work this way
SELECT DISTINCT customer_id, COUNT(*)
FROM orders;
-- ✅ Correct: Use GROUP BY for aggregation
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;
DISTINCT vs DISTINCT ON (PostgreSQL)
PostgreSQL offers a powerful extension called DISTINCT ON that lets you get the first row for each unique value:
-- PostgreSQL only: Get the most recent order per customer
SELECT DISTINCT ON (customer_id)
customer_id,
order_date,
amount
FROM orders
ORDER BY customer_id, order_date DESC;
This is incredibly useful but not standard SQL—it only works in PostgreSQL. Other databases require different approaches (like window functions with ROW_NUMBER()).
DISTINCT in Subqueries
DISTINCT is often useful in subqueries, especially with IN or NOT IN:
Note: In many cases, EXISTS or JOIN might be more efficient than IN with a subquery, but DISTINCT ensures you're not checking the same value multiple times.
Combining DISTINCT with Other Clauses
You can use DISTINCT with most SQL clauses:
With WHERE
SELECT DISTINCT city
FROM customers
WHERE country = 'USA';
With ORDER BY
SELECT DISTINCT category
FROM products
ORDER BY category DESC;
With LIMIT
-- Get first 10 unique cities
SELECT DISTINCT city
FROM customers
ORDER BY city
LIMIT 10;
With JOIN
SELECT DISTINCT c.country
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01';
Interactive Challenge
Ready to test your knowledge? Try this challenge:
Challenge: Write a query to find all unique combinations of city and country where customers have spent more than $500 in total. Order the results by country and city.
You have access to the following tables:
customers_distinct_challenge
customer_id(INTEGER)name(TEXT)city(TEXT)country(TEXT)
orders_distinct_challenge
order_id(INTEGER)customer_id(INTEGER)amount(DECIMAL)
Click to see the solution
-- Solution using GROUP BY (better for this case)
SELECT c.city, c.country
FROM customers_distinct_challenge c
JOIN orders_distinct_challenge o ON c.customer_id = o.customer_id
GROUP BY c.city, c.country
HAVING SUM(o.amount) > 500
ORDER BY c.country, c.city;
-- Alternative: Using DISTINCT with a subquery
SELECT DISTINCT c.city, c.country
FROM customers_distinct_challenge c
WHERE c.customer_id IN (
SELECT customer_id
FROM orders_distinct_challenge
GROUP BY customer_id
HAVING SUM(amount) > 500
)
ORDER BY c.country, c.city;
The first solution is generally more efficient because it uses GROUP BY with HAVING, which is designed for aggregation.
Best Practices
Here's a quick checklist for using DISTINCT effectively:
- Use DISTINCT sparingly: Only when you actually need to remove duplicates
- Investigate duplicates: If you need
DISTINCT, understand why duplicates exist - Consider alternatives: Sometimes
GROUP BY,EXISTS, or window functions are better - Index appropriately: Ensure columns used with
DISTINCTare indexed - Filter first: Apply
WHEREclauses beforeDISTINCTwhen possible - Be specific: Only select the columns you need
- Test performance: On large datasets, compare
DISTINCTwith alternative approaches - Document your intent: Add comments explaining why you're using
DISTINCT
DISTINCT vs Other Deduplication Methods
Here's a comparison of different ways to handle duplicates:
Conclusion
SQL DISTINCT is a powerful tool for eliminating duplicate rows and finding unique values in your data. Here are the key takeaways:
- DISTINCT removes duplicate rows from your result set based on all selected columns
- COUNT(DISTINCT column) lets you count unique values efficiently
- DISTINCT treats NULL as a unique value, returning only one NULL if multiple exist
- Performance matters: Use indexes, filter early, and consider alternatives for large datasets
- GROUP BY is better for aggregation, while DISTINCT is ideal for simple deduplication
- Don't use DISTINCT to hide data quality issues—fix the root cause instead
Whether you're analyzing unique customers, deduplicating email lists, or counting distinct values, mastering DISTINCT will make your SQL queries more precise and your data analysis more accurate.
Ready to practice more? Try our interactive SQL playground or explore our guide on SQL UNION to learn about combining unique results from multiple queries!