SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed
SQL Playground
TutorialsPlayground
SQL FormatterSQL MinifierSyntax ValidatorJSON to SQLCSV to SQLSQL to JSONER Diagram Generator
Daily ChallengeInterviewsCheat SheetBlog
Back to Blog
2026-02-16
7 min read

SQL for Anomaly Detection: Finding Outliers

sqlanalyticsdata-analysiswindow-functionsanomaly-detection

Your sales dashboard looks fine on average—revenue is up 5% month-over-month. But buried in your data are a few transactions that are wildly off: a $0.01 order, a sudden spike in refunds, a server metric that jumped 10x for one minute.

These are anomalies, and averages hide them. SQL can surface them.

Time-series line chart showing normal data points with two outlier spikes circled in red, illustrating SQL anomaly detection
Time-series line chart showing normal data points with two outlier spikes circled in red, illustrating SQL anomaly detection

In this post, we'll explore three practical methods to detect outliers directly in SQL: the Z-Score method, the Interquartile Range (IQR) method, and using moving averages to catch time-series spikes.

What is an Anomaly?

An anomaly (or outlier) is a data point that deviates significantly from expected behavior. Anomaly detection matters for:

  • Fraud detection: A transaction 100x larger than a user's typical order.
  • System monitoring: A server latency spike at 3am.
  • Data quality: Accidentally inserted negative prices or future timestamps.
  • Business intelligence: An inexplicable dip in conversion rates.

The challenge is defining "significantly different"—and SQL gives us several statistical tools to do exactly that.

The Sample Dataset

Let's create a table of daily sales transactions. Two rows are clearly suspicious: row #7 is a $1,850 spike, and row #10 is a $4.99 near-zero value.

Interactive SQL
Loading...

Now let's catch these programmatically—without eyeballing the data.

Method 1: Z-Score

The Z-Score measures how many standard deviations a value is from the mean:

Z = (x − μ) / σ

Where μ is the mean and σ is the standard deviation. A Z-Score above 2 or 3 (or below -2 / -3) is typically flagged as an anomaly.

Interactive SQL
Loading...

Notice that only the $1,850 spike is flagged here. Because that outlier is so extreme, it dramatically inflates the standard deviation, which actually protects the $4.99 value from being caught. This is the core weakness of the Z-Score method.

Limitation: A single extreme outlier can inflate σ, making other moderate outliers appear normal.

Method 2: Interquartile Range (IQR)

The IQR method is more robust to extreme values. It defines the normal band as:

  • Lower Fence = Q1 − 1.5 × IQR
  • Upper Fence = Q3 + 1.5 × IQR

Where IQR = Q3 − Q1. Values outside these fences are outliers.

SQLite doesn't have a built-in PERCENTILE_CONT function, but NTILE(4) gives us a solid approximation:

Interactive SQL
Loading...

The IQR method catches both the $1,850 spike and the $4.99 dip because it isn't distorted by either extreme. This makes it the go-to method for skewed financial data.

Method 3: Moving Average Deviation

For time-series data, comparing each value to a rolling window is often the most intuitive approach. A value is anomalous if it deviates significantly from its recent neighborhood:

Interactive SQL
Loading...

The window ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING compares each day's sales to the previous 4 days. The January 7th spike stands out sharply against the $118–$142 range of the preceding days.

Which Method Should You Use?

MethodBest ForWeakness
Z-ScoreNormally distributed dataOne extreme outlier distorts $\sigma$
IQRSkewed data, robust detectionMisses gradual drift over time
Moving AverageTime-series, sequential eventsNeeds sufficient historical data

In practice, you can combine methods: flag only records that are anomalous by two or more methods out of three. This dramatically reduces false positives.

Practical Tips

  • Segment before analyzing: Compute statistics per product_category or region—not across all data—so you're comparing apples to apples.
  • Tune thresholds by domain: A 3-sigma rule for financial data; a lower threshold may work better for operational metrics.
  • Log, don't delete: When you find an anomaly, write it to a review table rather than silently discarding it.
  • Schedule it: Run your anomaly queries daily or hourly as part of your data pipeline. Stale anomaly detection is no anomaly detection.

Conclusion

Anomaly detection in SQL doesn't require machine learning. With Z-Scores, IQR fences, and rolling window comparisons, you can surface outliers directly in your data warehouse using tools you already know.

Start with the IQR method for a robust, distribution-agnostic approach. Layer in moving averages when working with time-series data. Your future self—staring at a fraud alert at 2am—will thank you.

Share this article:

Related Articles

sqldata-analysis

SQL for Data Analysis: The Ultimate Guide

Move beyond basic SELECTs. Master the core SQL techniques for real-world data analysis: Data Cleaning, Time-Series Analysis, Window Functions, and Cohort Analysis.

Read more
sqldata-analysis

Calculating Percentiles and Median in SQL

AVG tells you the mean, but what about median and percentiles? Learn how to calculate these essential statistics in SQL using window functions and clever tricks.

Read more
sqlwindow-functions

Ranking Data with SQL: RANK, DENSE_RANK, and ROW_NUMBER Explained

Building leaderboards, finding top performers, or paginating results? Master the three SQL ranking functions and understand exactly when to use each one.

Read more
Previous

SQL for Data Analysis: The Ultimate Guide

Next

Star Schema vs. Snowflake Schema Explained

Comments

© 2026 SQL Playground. Built for developers.

PrivacyTermsAboutRSS Feed