SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed
SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog
Back to Blog
2026-01-19
5 min read

Removing Duplicate Rows in SQL: A Complete Guide

sqldata-cleaningduplicatesetlbest-practices

Duplicate rows sneak into databases through data imports, buggy applications, or missing constraints. Cleaning them up is a critical skill for any data professional.

This guide covers multiple approaches, from identifying duplicates to safely deleting them.

Step 1: Finding Duplicates

Before deleting anything, let's identify what we're dealing with.

Count Duplicates with GROUP BY

Interactive SQL
Loading...

Now we know alice@example.com appears twice and bob@example.com appears three times.

Step 2: See All Duplicate Rows

Use a subquery to fetch the full details:

Interactive SQL
Loading...

Strategy 1: Keep the First (or Last) Record

The most common approach uses ROW_NUMBER() to mark which duplicate to keep.

Using ROW_NUMBER to Identify Duplicates

Interactive SQL
Loading...
  • row_num = 1: The first record (keep this one)
  • row_num > 1: Duplicates (delete these)

Delete Using CTE

Now we can delete the duplicates:

-- SQLite syntax for deleting duplicates
DELETE FROM users
WHERE id IN (
  SELECT id FROM (
    SELECT 
      id,
      ROW_NUMBER() OVER (
        PARTITION BY email 
        ORDER BY created_at
      ) as row_num
    FROM users
  )
  WHERE row_num > 1
);

Tip: Change ORDER BY created_at to ORDER BY created_at DESC to keep the latest record instead.

Strategy 2: Keep Record with Most Data

Sometimes you want to keep the most "complete" record:

-- Keep the record with the longest name (most data)
SELECT id FROM (
  SELECT 
    id,
    ROW_NUMBER() OVER (
      PARTITION BY email 
      ORDER BY LENGTH(name) DESC
    ) as row_num
  FROM users
)
WHERE row_num > 1;

Strategy 3: SELECT DISTINCT INTO New Table

For heavily duplicated tables, it's often easier to create a clean copy:

-- Create new table with unique records only
CREATE TABLE users_clean AS
SELECT DISTINCT email, name
FROM users;

-- Or keep the first of each duplicate
CREATE TABLE users_clean AS
SELECT * FROM (
  SELECT 
    *,
    ROW_NUMBER() OVER (
      PARTITION BY email 
      ORDER BY created_at
    ) as rn
  FROM users
)
WHERE rn = 1;

Strategy 4: Aggregating Duplicates

Sometimes duplicates have different values you want to combine:

Interactive SQL
Loading...

Preventing Future Duplicates

Add UNIQUE Constraint

-- Prevent duplicate emails
CREATE UNIQUE INDEX idx_unique_email ON users(email);

-- Or during table creation
CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  email TEXT UNIQUE,
  name TEXT
);

Use INSERT OR IGNORE

-- Skip inserts that would create duplicates
INSERT OR IGNORE INTO users (email, name)
VALUES ('alice@example.com', 'Alice');

Quick Reference: Which Strategy?

ScenarioBest Approach
Keep oldest recordROW_NUMBER() ORDER BY created_at
Keep newest recordROW_NUMBER() ORDER BY created_at DESC
Keep most completeROW_NUMBER() ORDER BY LENGTH(field) DESC
Combine valuesGROUP BY with SUM/MAX
Heavy duplicationCreate new clean table
Prevent future dupesUNIQUE constraint

Safety Tips

  1. Always backup first: Run SELECT before DELETE to verify what will be removed.

  2. Use transactions: Wrap deletions in BEGIN/COMMIT so you can ROLLBACK if needed.

  3. Check foreign keys: Deleting a row might break references in other tables.

  4. Log deleted rows: Insert them into an archive table before deleting.

Conclusion

Duplicate removal is a multi-step process:

  1. Identify duplicates with GROUP BY HAVING COUNT(*) > 1
  2. Examine them to decide which to keep
  3. Mark duplicates using ROW_NUMBER() OVER (PARTITION BY...)
  4. Delete rows where row_num > 1
  5. Prevent future duplicates with UNIQUE constraints

Master these patterns and you'll handle any data deduplication challenge.

Share this article:

Related Articles

sqldata-cleaning

Data Cleaning with SQL: From Messy to Masterpiece

Real-world data is dirty. Master 4 essential SQL functions to clean strings, handle NULLs, and fix formatting errors.

Read more
sqlbest-practices

SQL Window Frames: ROWS vs RANGE

Learn how ROWS and RANGE window frames change results, avoid hidden pitfalls, and build correct moving calculations with clear, runnable examples.

Read more
sqletl

Mastering Slowly Changing Dimensions (SCD Type 2) in SQL

Data changes, but history matters. Learn how to track changes over time using the SCD Type 2 pattern to keep a perfect audit trail of your data.

Read more
Previous

Ranking Data with SQL: RANK, DENSE_RANK, and ROW_NUMBER Explained

Next

Calculating Percentiles and Median in SQL

Comments

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed