Mastering SQL DISTINCT: Remove Duplicates and Find Unique Values

Ever run a query and gotten back way more rows than expected, with the same values repeating over and over? Or needed to find out exactly how many unique customers placed orders last month? That's where SQL DISTINCT comes to the rescue.

In this comprehensive guide, we'll explore everything you need to know about DISTINCT—from basic usage to advanced techniques, performance considerations, and common mistakes to avoid.

What is SQL DISTINCT?

The DISTINCT keyword eliminates duplicate rows from your query results, returning only unique values. It's one of the most commonly used SQL features for data analysis and reporting.

Here's the basic syntax:

SELECT DISTINCT column1, column2
FROM table_name;

When you use DISTINCT, SQL compares all the columns in your SELECT list and returns only rows where the entire combination of values is unique.

Basic DISTINCT Usage

Let's start with a simple example. Imagine you have an orders table and want to know which cities have placed orders:

Interactive SQL

Without DISTINCT, you'd get "New York" three times, "Los Angeles" twice, and "Chicago" twice. With DISTINCT, you get each city exactly once.

Try removing DISTINCT from the query above to see the difference!

DISTINCT with Multiple Columns

When you use DISTINCT with multiple columns, SQL looks for unique combinations of all the columns together:

Interactive SQL

Notice that even though "Alice Johnson" from "New York" appears twice in the orders table (orders 1 and 4), DISTINCT returns it only once because the combination of city and customer_name is the same.

DISTINCT vs GROUP BY

A common question: when should you use DISTINCT versus GROUP BY? Here's the key difference:

Feature	DISTINCT	GROUP BY
Purpose	Remove duplicates	Group rows for aggregation
Aggregation	Cannot use aggregate functions	Can use COUNT, SUM, AVG, etc.
Performance	Generally faster for simple deduplication	Better for complex aggregations
Use Case	Get unique values	Calculate statistics per group

-- DISTINCT: Just get unique cities
SELECT DISTINCT city
FROM orders_distinct_basic;

-- GROUP BY: Get cities with order counts
SELECT city, COUNT(*) as order_count
FROM orders_distinct_basic
GROUP BY city;

Rule of thumb: If you need aggregation (COUNT, SUM, AVG, etc.), use GROUP BY. If you just need unique values, use DISTINCT.

Counting Unique Values

One of the most powerful uses of DISTINCT is counting unique values using COUNT(DISTINCT column):

Interactive SQL

This is incredibly useful for analytics—you can see total orders (6), unique customers (4), and unique cities (3) all in one query!

DISTINCT with NULL Values

Here's something important to know: DISTINCT treats NULL as a unique value. If you have multiple NULL values in a column, DISTINCT will return only one NULL:

Interactive SQL

Notice that even though two customers have NULL emails, DISTINCT returns only one NULL in the results.

Common Use Cases for DISTINCT

Let's explore some real-world scenarios where DISTINCT shines:

1. Finding Unique Product Categories

-- Get all product categories in inventory
SELECT DISTINCT category
FROM products
WHERE stock_quantity > 0
ORDER BY category;

2. Identifying Active Users

-- Count unique users who logged in this month
SELECT COUNT(DISTINCT user_id) as active_users
FROM login_logs
WHERE login_date >= '2025-12-01';

3. Deduplicating Email Lists

-- Get unique email addresses for marketing campaign
SELECT DISTINCT email
FROM customers
WHERE email IS NOT NULL
  AND subscribed = true;

4. Finding Distinct Combinations

-- Get all unique product-supplier combinations
SELECT DISTINCT product_id, supplier_id
FROM inventory
ORDER BY product_id, supplier_id;

Performance Considerations

While DISTINCT is powerful, it can impact query performance. Here's what you need to know:

DISTINCT Requires Sorting or Hashing

To find duplicates, the database must either:

Sort the result set and compare adjacent rows
Hash the values and check for duplicates

Both operations consume memory and CPU, especially with large datasets.

Performance Tips

1. Use DISTINCT on Indexed Columns

-- Faster: city is indexed
SELECT DISTINCT city
FROM orders;

-- Slower: combination of non-indexed columns
SELECT DISTINCT customer_name, phone, address
FROM orders;

2. Filter Before DISTINCT

-- Good: Filter first, then deduplicate
SELECT DISTINCT city
FROM orders
WHERE order_date >= '2025-01-01';

-- Less efficient: Deduplicate everything, then filter
-- (This example won't work as intended)

3. Consider EXISTS Instead of DISTINCT with Joins

-- Using DISTINCT (can be slow)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

-- Using EXISTS (often faster)
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o 
  WHERE o.customer_id = c.customer_id
);

4. Use COUNT(DISTINCT) Wisely

-- Can be slow on large tables
SELECT COUNT(DISTINCT customer_id)
FROM orders;

-- Consider approximate counts for very large datasets
-- (database-specific functions like HyperLogLog)

Common Mistakes to Avoid

1. Using DISTINCT to Hide Data Quality Issues

-- ❌ Bad: Using DISTINCT to hide duplicate data
SELECT DISTINCT customer_id, name, email
FROM customers;

-- ✅ Good: Fix the root cause
-- Investigate why there are duplicates and clean the data

If you're using DISTINCT to fix duplicates, you might have a data quality problem that needs addressing at the source.

2. DISTINCT on All Columns When You Need Specific Ones

-- ❌ Inefficient: DISTINCT on all columns
SELECT DISTINCT customer_id, name, email, phone, address, city, state, zip
FROM customers;

-- ✅ Better: Only select what you need
SELECT DISTINCT email
FROM customers;

3. Forgetting DISTINCT Applies to the Entire Row

-- This returns unique combinations of name AND city
SELECT DISTINCT name, city
FROM customers;

-- If you want unique names only, do this:
SELECT DISTINCT name
FROM customers;

4. Using DISTINCT with Aggregate Functions Incorrectly

-- ❌ Wrong: DISTINCT doesn't work this way
SELECT DISTINCT customer_id, COUNT(*) 
FROM orders;

-- ✅ Correct: Use GROUP BY for aggregation
SELECT customer_id, COUNT(*) 
FROM orders
GROUP BY customer_id;

DISTINCT vs DISTINCT ON (PostgreSQL)

PostgreSQL offers a powerful extension called DISTINCT ON that lets you get the first row for each unique value:

-- PostgreSQL only: Get the most recent order per customer
SELECT DISTINCT ON (customer_id)
  customer_id,
  order_date,
  amount
FROM orders
ORDER BY customer_id, order_date DESC;

This is incredibly useful but not standard SQL—it only works in PostgreSQL. Other databases require different approaches (like window functions with ROW_NUMBER()).

DISTINCT in Subqueries

DISTINCT is often useful in subqueries, especially with IN or NOT IN:

Interactive SQL

Note: In many cases, EXISTS or JOIN might be more efficient than IN with a subquery, but DISTINCT ensures you're not checking the same value multiple times.

Combining DISTINCT with Other Clauses

You can use DISTINCT with most SQL clauses:

With WHERE

SELECT DISTINCT city
FROM customers
WHERE country = 'USA';

With ORDER BY

SELECT DISTINCT category
FROM products
ORDER BY category DESC;

With LIMIT

-- Get first 10 unique cities
SELECT DISTINCT city
FROM customers
ORDER BY city
LIMIT 10;

With JOIN

SELECT DISTINCT c.country
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01';

Interactive Challenge

Ready to test your knowledge? Try this challenge:

Challenge: Write a query to find all unique combinations of city and country where customers have spent more than $500 in total. Order the results by country and city.

You have access to the following tables:

customers_distinct_challenge

customer_id (INTEGER)
name (TEXT)
city (TEXT)
country (TEXT)

orders_distinct_challenge

order_id (INTEGER)
customer_id (INTEGER)
amount (DECIMAL)

Interactive SQL

Click to see the solution

-- Solution using GROUP BY (better for this case)
SELECT c.city, c.country
FROM customers_distinct_challenge c
JOIN orders_distinct_challenge o ON c.customer_id = o.customer_id
GROUP BY c.city, c.country
HAVING SUM(o.amount) > 500
ORDER BY c.country, c.city;

-- Alternative: Using DISTINCT with a subquery
SELECT DISTINCT c.city, c.country
FROM customers_distinct_challenge c
WHERE c.customer_id IN (
  SELECT customer_id
  FROM orders_distinct_challenge
  GROUP BY customer_id
  HAVING SUM(amount) > 500
)
ORDER BY c.country, c.city;

The first solution is generally more efficient because it uses GROUP BY with HAVING, which is designed for aggregation.

Best Practices

Here's a quick checklist for using DISTINCT effectively:

Use DISTINCT sparingly: Only when you actually need to remove duplicates
Investigate duplicates: If you need DISTINCT, understand why duplicates exist
Consider alternatives: Sometimes GROUP BY, EXISTS, or window functions are better
Index appropriately: Ensure columns used with DISTINCT are indexed
Filter first: Apply WHERE clauses before DISTINCT when possible
Be specific: Only select the columns you need
Test performance: On large datasets, compare DISTINCT with alternative approaches
Document your intent: Add comments explaining why you're using DISTINCT

DISTINCT vs Other Deduplication Methods

Here's a comparison of different ways to handle duplicates:

Conclusion

SQL DISTINCT is a powerful tool for eliminating duplicate rows and finding unique values in your data. Here are the key takeaways:

DISTINCT removes duplicate rows from your result set based on all selected columns
COUNT(DISTINCT column) lets you count unique values efficiently
DISTINCT treats NULL as a unique value, returning only one NULL if multiple exist
Performance matters: Use indexes, filter early, and consider alternatives for large datasets
GROUP BY is better for aggregation, while DISTINCT is ideal for simple deduplication
Don't use DISTINCT to hide data quality issues—fix the root cause instead

Whether you're analyzing unique customers, deduplicating email lists, or counting distinct values, mastering DISTINCT will make your SQL queries more precise and your data analysis more accurate.

Ready to practice more? Try our interactive SQL playground or explore our guide on SQL UNION to learn about combining unique results from multiple queries!

In this comprehensive guide, we'll explore everything you need to know about DISTINCT—from basic usage to advanced techniques, performance considerations, and common mistakes to avoid.

What is SQL DISTINCT?

The DISTINCT keyword eliminates duplicate rows from your query results, returning only unique values. It's one of the most commonly used SQL features for data analysis and reporting.

Here's the basic syntax:

SELECT DISTINCT column1, column2
FROM table_name;

When you use DISTINCT, SQL compares all the columns in your SELECT list and returns only rows where the entire combination of values is unique.

Basic DISTINCT Usage

Let's start with a simple example. Imagine you have an orders table and want to know which cities have placed orders:

Interactive SQL

Without DISTINCT, you'd get "New York" three times, "Los Angeles" twice, and "Chicago" twice. With DISTINCT, you get each city exactly once.

Try removing DISTINCT from the query above to see the difference!

DISTINCT with Multiple Columns

When you use DISTINCT with multiple columns, SQL looks for unique combinations of all the columns together:

Interactive SQL

DISTINCT vs GROUP BY

A common question: when should you use DISTINCT versus GROUP BY? Here's the key difference:

Feature	DISTINCT	GROUP BY
Purpose	Remove duplicates	Group rows for aggregation
Aggregation	Cannot use aggregate functions	Can use COUNT, SUM, AVG, etc.
Performance	Generally faster for simple deduplication	Better for complex aggregations
Use Case	Get unique values	Calculate statistics per group

-- DISTINCT: Just get unique cities
SELECT DISTINCT city
FROM orders_distinct_basic;

-- GROUP BY: Get cities with order counts
SELECT city, COUNT(*) as order_count
FROM orders_distinct_basic
GROUP BY city;

Rule of thumb: If you need aggregation (COUNT, SUM, AVG, etc.), use GROUP BY. If you just need unique values, use DISTINCT.

Counting Unique Values

One of the most powerful uses of DISTINCT is counting unique values using COUNT(DISTINCT column):

Interactive SQL

This is incredibly useful for analytics—you can see total orders (6), unique customers (4), and unique cities (3) all in one query!

DISTINCT with NULL Values

Here's something important to know: DISTINCT treats NULL as a unique value. If you have multiple NULL values in a column, DISTINCT will return only one NULL:

Interactive SQL

Notice that even though two customers have NULL emails, DISTINCT returns only one NULL in the results.

Common Use Cases for DISTINCT

Let's explore some real-world scenarios where DISTINCT shines:

1. Finding Unique Product Categories

-- Get all product categories in inventory
SELECT DISTINCT category
FROM products
WHERE stock_quantity > 0
ORDER BY category;

2. Identifying Active Users

-- Count unique users who logged in this month
SELECT COUNT(DISTINCT user_id) as active_users
FROM login_logs
WHERE login_date >= '2025-12-01';

3. Deduplicating Email Lists

-- Get unique email addresses for marketing campaign
SELECT DISTINCT email
FROM customers
WHERE email IS NOT NULL
  AND subscribed = true;

4. Finding Distinct Combinations

-- Get all unique product-supplier combinations
SELECT DISTINCT product_id, supplier_id
FROM inventory
ORDER BY product_id, supplier_id;

Performance Considerations

While DISTINCT is powerful, it can impact query performance. Here's what you need to know:

DISTINCT Requires Sorting or Hashing

To find duplicates, the database must either:

Sort the result set and compare adjacent rows
Hash the values and check for duplicates

Both operations consume memory and CPU, especially with large datasets.

Performance Tips

1. Use DISTINCT on Indexed Columns

-- Faster: city is indexed
SELECT DISTINCT city
FROM orders;

-- Slower: combination of non-indexed columns
SELECT DISTINCT customer_name, phone, address
FROM orders;

2. Filter Before DISTINCT

-- Good: Filter first, then deduplicate
SELECT DISTINCT city
FROM orders
WHERE order_date >= '2025-01-01';

-- Less efficient: Deduplicate everything, then filter
-- (This example won't work as intended)

3. Consider EXISTS Instead of DISTINCT with Joins

-- Using DISTINCT (can be slow)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;

-- Using EXISTS (often faster)
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o 
  WHERE o.customer_id = c.customer_id
);

4. Use COUNT(DISTINCT) Wisely

-- Can be slow on large tables
SELECT COUNT(DISTINCT customer_id)
FROM orders;

-- Consider approximate counts for very large datasets
-- (database-specific functions like HyperLogLog)

Common Mistakes to Avoid

1. Using DISTINCT to Hide Data Quality Issues

-- ❌ Bad: Using DISTINCT to hide duplicate data
SELECT DISTINCT customer_id, name, email
FROM customers;

-- ✅ Good: Fix the root cause
-- Investigate why there are duplicates and clean the data

If you're using DISTINCT to fix duplicates, you might have a data quality problem that needs addressing at the source.

2. DISTINCT on All Columns When You Need Specific Ones

-- ❌ Inefficient: DISTINCT on all columns
SELECT DISTINCT customer_id, name, email, phone, address, city, state, zip
FROM customers;

-- ✅ Better: Only select what you need
SELECT DISTINCT email
FROM customers;

3. Forgetting DISTINCT Applies to the Entire Row

-- This returns unique combinations of name AND city
SELECT DISTINCT name, city
FROM customers;

-- If you want unique names only, do this:
SELECT DISTINCT name
FROM customers;

4. Using DISTINCT with Aggregate Functions Incorrectly

-- ❌ Wrong: DISTINCT doesn't work this way
SELECT DISTINCT customer_id, COUNT(*) 
FROM orders;

-- ✅ Correct: Use GROUP BY for aggregation
SELECT customer_id, COUNT(*) 
FROM orders
GROUP BY customer_id;

DISTINCT vs DISTINCT ON (PostgreSQL)

PostgreSQL offers a powerful extension called DISTINCT ON that lets you get the first row for each unique value:

-- PostgreSQL only: Get the most recent order per customer
SELECT DISTINCT ON (customer_id)
  customer_id,
  order_date,
  amount
FROM orders
ORDER BY customer_id, order_date DESC;

This is incredibly useful but not standard SQL—it only works in PostgreSQL. Other databases require different approaches (like window functions with ROW_NUMBER()).

DISTINCT in Subqueries

DISTINCT is often useful in subqueries, especially with IN or NOT IN:

Interactive SQL

Note: In many cases, EXISTS or JOIN might be more efficient than IN with a subquery, but DISTINCT ensures you're not checking the same value multiple times.

Combining DISTINCT with Other Clauses

You can use DISTINCT with most SQL clauses:

With WHERE

SELECT DISTINCT city
FROM customers
WHERE country = 'USA';

With ORDER BY

SELECT DISTINCT category
FROM products
ORDER BY category DESC;

With LIMIT

-- Get first 10 unique cities
SELECT DISTINCT city
FROM customers
ORDER BY city
LIMIT 10;

With JOIN

SELECT DISTINCT c.country
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01';

Interactive Challenge

Ready to test your knowledge? Try this challenge:

Challenge: Write a query to find all unique combinations of city and country where customers have spent more than $500 in total. Order the results by country and city.

You have access to the following tables:

customers_distinct_challenge

customer_id (INTEGER)
name (TEXT)
city (TEXT)
country (TEXT)

orders_distinct_challenge

order_id (INTEGER)
customer_id (INTEGER)
amount (DECIMAL)

Interactive SQL

Click to see the solution

-- Solution using GROUP BY (better for this case)
SELECT c.city, c.country
FROM customers_distinct_challenge c
JOIN orders_distinct_challenge o ON c.customer_id = o.customer_id
GROUP BY c.city, c.country
HAVING SUM(o.amount) > 500
ORDER BY c.country, c.city;

-- Alternative: Using DISTINCT with a subquery
SELECT DISTINCT c.city, c.country
FROM customers_distinct_challenge c
WHERE c.customer_id IN (
  SELECT customer_id
  FROM orders_distinct_challenge
  GROUP BY customer_id
  HAVING SUM(amount) > 500
)
ORDER BY c.country, c.city;

The first solution is generally more efficient because it uses GROUP BY with HAVING, which is designed for aggregation.

Best Practices

Here's a quick checklist for using DISTINCT effectively:

Use DISTINCT sparingly: Only when you actually need to remove duplicates
Investigate duplicates: If you need DISTINCT, understand why duplicates exist
Consider alternatives: Sometimes GROUP BY, EXISTS, or window functions are better
Index appropriately: Ensure columns used with DISTINCT are indexed
Filter first: Apply WHERE clauses before DISTINCT when possible
Be specific: Only select the columns you need
Test performance: On large datasets, compare DISTINCT with alternative approaches
Document your intent: Add comments explaining why you're using DISTINCT

DISTINCT vs Other Deduplication Methods

Here's a comparison of different ways to handle duplicates:

Conclusion

SQL DISTINCT is a powerful tool for eliminating duplicate rows and finding unique values in your data. Here are the key takeaways:

DISTINCT removes duplicate rows from your result set based on all selected columns
COUNT(DISTINCT column) lets you count unique values efficiently
DISTINCT treats NULL as a unique value, returning only one NULL if multiple exist
Performance matters: Use indexes, filter early, and consider alternatives for large datasets
GROUP BY is better for aggregation, while DISTINCT is ideal for simple deduplication
Don't use DISTINCT to hide data quality issues—fix the root cause instead

Whether you're analyzing unique customers, deduplicating email lists, or counting distinct values, mastering DISTINCT will make your SQL queries more precise and your data analysis more accurate.

Ready to practice more? Try our interactive SQL playground or explore our guide on SQL UNION to learn about combining unique results from multiple queries!

Related Articles

Handling NULLs in SQL: The Ultimate Guide

Understanding SQL UNION: Combine Query Results Like a Pro

SQL for Data Analysis: The Ultimate Guide

Comments

Related Articles

Handling NULLs in SQL: The Ultimate Guide

Understanding SQL UNION: Combine Query Results Like a Pro

SQL for Data Analysis: The Ultimate Guide

Comments