Data is rarely perfect. It comes with inconsistent casing, extra spaces, and mixed formats. As a data analyst or engineer, you'll spend a significant amount of time just cleaning data before you can analyze it.
That's where SQL string functions come in. They are your broom and dustpan for messy data.
In this guide, we'll cover the essential functions you need to know to manipulate text data effectively.
1. Basic Formatting: UPPER, LOWER, and LENGTH
The first step in data cleaning is often standardization.
UPPER(str): Converts string to uppercase.LOWER(str): Converts string to lowercase.LENGTH(str): Returns the number of characters in the string.
Interactive Example: Normalizing Input
Imagine a user registration table where users entered their emails in various formats.
Pro Tip: Always normalize strings (e.g., LOWER(email) = LOWER(input)) when comparing them to avoid case-sensitivity issues.
2. Concatenation: Joining Strings
Sometimes you need to combine data from multiple columns into one. In standard SQL (and SQLite), we use the double pipe operator ||.
Note: Some databases like SQL Server use
+or theCONCAT()function, but||is the ANSI standard.
3. Cleaning Data: TRIM and REPLACE
Extra spaces and unwanted characters are common enemies.
TRIM(str): Removes leading and trailing whitespace.REPLACE(str, from, to): Replaces all occurrences of a substring.
Interactive Example: Fixing Product Codes
4. Extraction: SUBSTR and INSTR
Extracting specific parts of a string is powerful.
SUBSTR(str, start, length): Extracts a portion of a string.INSTR(str, sub): Returns the position of the first occurrence ofsub.
-- Extract first 3 chars
SUBSTR('Hello', 1, 3) -- 'Hel'
-- Find position of '@'
INSTR('user@example.com', '@') -- 5
Interactive Challenge: Extracting Email Domains
This is a classic interview question. You have a list of emails, and you need to extract just the domain part (everything after the @).
Hint:
- Find the position of
@usingINSTR. - Use
SUBSTRto start from that position + 1.
You have access to:
leads_string_challenge
id(INTEGER)email(TEXT)
Click to see the solution
SELECT
email,
-- Start from position of '@' + 1
SUBSTR(email, INSTR(email, '@') + 1) as domain
FROM leads_string_challenge;
Conclusion
String functions are indispensable tools in your SQL toolkit. Whether you're fixing messy user input, formatting reports, or extracting insights from text, these functions will save you hours of manual work.
Key Functions to Remember:
- Formatting:
UPPER,LOWER - Cleaning:
TRIM,REPLACE - Joining:
|| - Extracting:
SUBSTR,INSTR
Master these, and you'll be able to handle almost any text data that comes your way!