SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed
SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog
Back to Blog
2025-12-28
3 min read

Data Cleaning with SQL: From Messy to Masterpiece

sqltutorialdata-cleaningetlstrings

In the real world, data is rarely pristine. It comes with trailing spaces, inconsistent capitalization, strange characters, and missing values.

Before you can analyze data, you must clean it. Here are the 4 essential tools in your SQL cleaning kit.

Messy data cleaned into standardized fields
Messy data cleaned into standardized fields

1. COALESCE: The Null-Buster

NULL values can ruin math and look ugly in reports. COALESCE() returns the first non-null value in a list.

SELECT COALESCE(phone_number, 'No Phone') FROM users;

2. TRIM: The Whitespace Whacker

Spaces at the start or end of strings are invisible enemies. They make 'Alice' imply not equal to 'Alice '.

SELECT TRIM(name) FROM users;

3. LOWER / UPPER: The Case Normalizer

'iPad', 'IPad', and 'ipad' are three different strings to a computer. Normalize them to fix grouping.

SELECT LOWER(device_type) FROM devices;

4. REPLACE: The Substitution Surgeon

Great for fixing inconsistent formats, like removing dashes from phone numbers or fixing typos.

SELECT REPLACE(phone, '-', '') FROM users;

Putting It All Together: Interactive Cleaning

Let's clean a messy customer list.

The Horror Show (Raw Data):

  • Inconsistent names (alice, Bob, CHARLIE)
  • Messy emails with spaces ( alice@email.com)
  • NULL phone numbers
  • NULL status (should be 'Active')
Interactive SQL
Loading...

The Cleanup Query:

Interactive SQL
Loading...

Tip: Depending on your SQL dialect (PostgreSQL, SQL Server), you might have INITCAP() to capitalize just the first letter (e.g., 'Alice'). Standard SQL doesn't always have INITCAP, so we often settle for UPPER or complex SUBSTRING logic for names.

Deduplication

Another huge part of cleaning is removing duplicates.

The best way to deduplicate? Use ROW_NUMBER().

WITH flagged_dupes AS (
  SELECT 
    *,
    ROW_NUMBER() OVER(PARTITION BY email ORDER BY id DESC) as rn
  FROM users
)
SELECT * FROM flagged_dupes WHERE rn = 1; -- Keep only the latest entry

Conclusion

Dirty data leads to dirty insights. By mastering TRIM, COALESCE, and normalization functions, you ensure your analysis is built on a solid foundation.

Share this article:

Related Articles

sqldata-cleaning

Removing Duplicate Rows in SQL: A Complete Guide

Duplicate data is a common headache. Learn multiple strategies to identify and remove duplicates in SQL, from simple DISTINCT to advanced ROW_NUMBER techniques.

Read more
sqlstrings

Essential SQL String Functions for Data Cleaning and Analysis

Clean, format, and extract data like a pro. Master essential SQL string functions including UPPER, LOWER, TRIM, REPLACE, SUBSTR, and concatenation.

Read more
sqltutorial

SQL Window Frames: ROWS vs RANGE

Learn how ROWS and RANGE window frames change results, avoid hidden pitfalls, and build correct moving calculations with clear, runnable examples.

Read more
Previous

Debugging Common SQL Logic Errors: Why Your Query is Wrong

Next

Calculating Moving Averages and Rolling Windows in SQL

Comments

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed