SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed
SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog
Back to Blog
2025-12-28
3 min read

Data Cleaning with SQL: From Messy to Masterpiece

sqltutorialdata-cleaningetlstrings

In the real world, data is rarely pristine. It comes with trailing spaces, inconsistent capitalization, strange characters, and missing values.

Before you can analyze data, you must clean it. Here are the 4 essential tools in your SQL cleaning kit.

1. COALESCE: The Null-Buster

NULL values can ruin math and look ugly in reports. COALESCE() returns the first non-null value in a list.

SELECT COALESCE(phone_number, 'No Phone') FROM users;

2. TRIM: The Whitespace Whacker

Spaces at the start or end of strings are invisible enemies. They make 'Alice' imply not equal to 'Alice '.

SELECT TRIM(name) FROM users;

3. LOWER / UPPER: The Case Normalizer

'iPad', 'IPad', and 'ipad' are three different strings to a computer. Normalize them to fix grouping.

SELECT LOWER(device_type) FROM devices;

4. REPLACE: The Substitution Surgeon

Great for fixing inconsistent formats, like removing dashes from phone numbers or fixing typos.

SELECT REPLACE(phone, '-', '') FROM users;

Putting It All Together: Interactive Cleaning

Let's clean a messy customer list.

The Horror Show (Raw Data):

  • Inconsistent names (alice, Bob, CHARLIE)
  • Messy emails with spaces ( alice@email.com)
  • NULL phone numbers
  • NULL status (should be 'Active')
Interactive SQL
Loading...

The Cleanup Query:

Interactive SQL
Loading...

Tip: Depending on your SQL dialect (PostgreSQL, SQL Server), you might have INITCAP() to capitalize just the first letter (e.g., 'Alice'). Standard SQL doesn't always have INITCAP, so we often settle for UPPER or complex SUBSTRING logic for names.

Deduplication

Another huge part of cleaning is removing duplicates.

The best way to deduplicate? Use ROW_NUMBER().

WITH flagged_dupes AS (
  SELECT 
    *,
    ROW_NUMBER() OVER(PARTITION BY email ORDER BY id DESC) as rn
  FROM users
)
SELECT * FROM flagged_dupes WHERE rn = 1; -- Keep only the latest entry

Conclusion

Dirty data leads to dirty insights. By mastering TRIM, COALESCE, and normalization functions, you ensure your analysis is built on a solid foundation.

Share this article:

Comments

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed