Query Optimization
Learn to write efficient SQL by identifying bottlenecks, reading EXPLAIN plans, designing indexes, and rewriting slow queries. Practice with real-world optimization challenges.
Missing Index on Filter Column
Missing index usage — full table scan on 500K rows
Slow Query
SELECT * FROM orders
WHERE customer_id = 12345
ORDER BY created_at DESC;Before
2.3s (Seq Scan, 500K rows)
After
4ms (Index Scan, 38 rows)
The SELECT * Anti-Pattern
Unnecessary columns fetched — table has 40 columns including large TEXT/BLOB fields
Slow Query
SELECT *
FROM products
WHERE category_id = 5
ORDER BY price DESC
LIMIT 20;Before
180ms (reads 40 cols, 6.2MB)
After
12ms (reads 3 cols, 0.4MB)
Correlated Subquery N+1 Problem
N+1 query pattern — correlated subquery executes once per row (100K customers = 100K subqueries)
Slow Query
SELECT c.name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.id) AS order_count
FROM customers c
WHERE c.status = 'active';Before
12.8s (100K subqueries)
After
340ms (single Hash Join + Aggregate)
Function on Indexed Column
Function wrapping indexed column — prevents index usage, forces full table scan
Slow Query
SELECT *
FROM transactions
WHERE EXTRACT(YEAR FROM created_at) = 2024
AND EXTRACT(MONTH FROM created_at) = 3;Before
3.1s (Seq Scan on 2M rows)
After
25ms (Index Range Scan)
N+1 Query from Application Code
N+1 query pattern — 501 database round-trips instead of 1
Slow Query
-- Query 1: fetch orders
SELECT id, total FROM orders
WHERE customer_id = 42 LIMIT 500;
-- Then for EACH order (x500 round-trips):
-- SELECT * FROM order_items
-- WHERE order_id = ?;Before
4.2s (501 queries, network overhead)
After
45ms (1 query, Hash Join)
DISTINCT on Large JOIN
DISTINCT after JOIN builds massive result set then deduplicates — O(n*m) then sort
Slow Query
SELECT DISTINCT c.id, c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id;Before
2.8s (2.3M intermediate rows, then sort)
After
180ms (semi-join, early exit)
OR Preventing Index Usage
OR between differently-indexed columns forces sequential scan or expensive bitmap merge
Slow Query
SELECT id, name, email
FROM users
WHERE email = 'test@example.com'
OR phone = '+1234567890';Before
890ms (Seq Scan on 5M rows)
After
2ms (two Index Scans)
Pagination with OFFSET
OFFSET scans and discards 20,000 rows — gets slower on every page
Slow Query
SELECT id, title, created_at
FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 20000;Before
8.2s on page 1000 (scans 20K rows)
After
3ms on any page (reads 20 rows)
Covering Index for Aggregation
Reads entire wide table (20 columns, 2M rows) when only 2 columns are needed for the aggregation
Slow Query
SELECT status,
COUNT(*) AS cnt,
SUM(total_amount) AS revenue
FROM orders
GROUP BY status;Before
1.4s (Seq Scan, reads 2M wide rows)
After
120ms (Index Only Scan, compact)
Composite Index Column Order
Two separate indexes force bitmap merge + sort instead of a single efficient scan
Slow Query
-- Two separate indexes exist:
-- idx_tickets_status (status)
-- idx_tickets_created (created_at)
SELECT id, title, status, created_at
FROM tickets
WHERE status = 'open'
ORDER BY created_at DESC
LIMIT 50;Before
650ms (Bitmap Scan + Sort)
After
8ms (Index Scan Backward, pre-sorted)
Implicit Type Casting Kills Index
phone is VARCHAR but compared to an integer — PostgreSQL casts every row's phone to numeric, preventing index usage
Slow Query
SELECT * FROM users
WHERE phone = 5551234567;Before
1.9s (Seq Scan, implicit cast per row)
After
0.3ms (Index Scan, direct match)
LIKE with Leading Wildcard
Leading wildcard % prevents B-tree index usage — full table scan on every row
Slow Query
SELECT id, name, email
FROM customers
WHERE email LIKE '%@gmail.com';Before
2.1s (Seq Scan 3M rows)
After
15ms (Index Scan)
COUNT(*) on Entire Table
PostgreSQL must do a full table scan for exact COUNT(*) — MVCC means no stored row count
Slow Query
SELECT COUNT(*)
FROM logs;
-- logs table has 50M rowsBefore
28s (Seq Scan 50M rows)
After
0.1ms (catalog lookup)
NOT IN with NULL Trap
NOT IN returns no rows if the subquery contains any NULL — also prevents index usage and builds a full hash
Slow Query
SELECT id, name
FROM products
WHERE id NOT IN (
SELECT product_id FROM order_items
);Before
4.2s (Hash Anti Join, NULL risk)
After
180ms (Nested Loop Anti Join, index-backed)
Unnecessary DISTINCT Hiding a JOIN Bug
DISTINCT masks a many-to-many explosion — 100K customers × 5 orders × 3 items = 1.5M rows sorted and deduplicated
Slow Query
SELECT DISTINCT c.id, c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id;Before
5.1s (1.5M rows → sort → unique)
After
120ms (semi-join, early exit)
Partial Index for Hot Path
Full index on status includes 95% completed rows that are never queried on this path — bloated index, more I/O
Slow Query
-- 95% of orders are 'completed', only 2% are 'pending'
-- Full index exists: idx_orders_status (status)
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at;Before
320ms (scans 2M-entry index)
After
5ms (scans 40K-entry partial index)
Window Function Recomputing on Every Row
Correlated subquery recalculates the average for each row — same average recomputed 500 times per category
Slow Query
SELECT *,
(SELECT AVG(price) FROM products p2
WHERE p2.category_id = p.category_id) AS avg_category_price
FROM products p;Before
3.8s (correlated subquery per row)
After
45ms (single WindowAgg pass)
Expensive CTE Evaluated Multiple Times
In PostgreSQL <12, CTEs are optimization fences — the aggregation runs 3 times. In 12+, the planner may still not inline complex CTEs
Slow Query
WITH order_stats AS (
SELECT customer_id,
COUNT(*) AS cnt,
SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
)
SELECT 'high_value' AS segment, COUNT(*) FROM order_stats WHERE revenue > 10000
UNION ALL
SELECT 'medium_value', COUNT(*) FROM order_stats WHERE revenue BETWEEN 1000 AND 10000
UNION ALL
SELECT 'low_value', COUNT(*) FROM order_stats WHERE revenue < 1000;Before
2.4s (3x aggregation passes)
After
420ms (single pass + conditional counts)
UPDATE Without Index on WHERE
No index on (status, created_at) — UPDATE does a sequential scan locking rows as it goes, blocking other queries
Slow Query
UPDATE orders
SET status = 'archived'
WHERE created_at < '2022-01-01'
AND status = 'completed';Before
45s (Seq Scan + 800K row locks)
After
200ms per batch (Index Scan + 5K locks)
Excessive JOIN Columns in GROUP BY
Grouping by 7 columns creates a huge hash key — slow hashing and high memory usage
Slow Query
SELECT c.id, c.name, c.email, c.phone,
c.address, c.city, c.state,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name, c.email, c.phone,
c.address, c.city, c.state;Before
1.8s (HashAggregate with 7-col key)
After
650ms (HashAggregate with 1-col key)
Sorting Without Index Support
No index on created_at — sorts entire 500K-row table in memory just to return 10 rows
Slow Query
SELECT id, name, created_at
FROM products
ORDER BY created_at DESC
LIMIT 10;Before
420ms (Seq Scan + Sort 500K rows)
After
0.5ms (Index Scan, 10 rows)
Materialized View for Dashboard Queries
Aggregates 5M rows on every request — 50 identical expensive queries per second
Slow Query
-- Dashboard query runs every page load (50 req/sec)
SELECT DATE_TRUNC('day', created_at) AS day,
COUNT(*) AS orders,
SUM(total_amount) AS revenue,
AVG(total_amount) AS avg_order
FROM orders
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1;Before
3.2s per request × 50 rps
After
2ms per request (pre-computed)
HAVING vs WHERE Placement
HAVING filters after aggregation — groups ALL categories first, then discards most results
Slow Query
SELECT category_id, COUNT(*) AS cnt
FROM products
GROUP BY category_id
HAVING category_id IN (1, 2, 3);Before
180ms (groups all 200 categories)
After
12ms (groups only 3 categories)
Redundant Subquery in FROM
Subquery computes MAX(created_at) for every customer but it's never used — wasted CPU and memory
Slow Query
SELECT sub.customer_id, sub.total_orders
FROM (
SELECT customer_id, COUNT(*) AS total_orders,
MAX(created_at) AS last_order
FROM orders
GROUP BY customer_id
) sub
WHERE sub.total_orders > 5;Before
680ms (computes unused MAX per group)
After
410ms (aggregate only what's needed)
Multi-Column IN vs EXISTS
Multi-column IN builds a full hash set of all alert pairs — can't leverage indexes efficiently on composite check
Slow Query
SELECT * FROM inventory
WHERE (warehouse_id, product_id) IN (
SELECT warehouse_id, product_id
FROM stock_alerts
WHERE alert_type = 'low_stock'
);Before
1.6s (Hash Semi Join, full materialization)
After
85ms (Nested Loop Semi Join, index-backed)
String Concatenation in WHERE
Concatenation expression on every row prevents any index usage — full table scan
Slow Query
SELECT * FROM customers
WHERE first_name || ' ' || last_name = 'John Smith';Before
1.2s (Seq Scan, concat per row)
After
0.4ms (Index Scan on composite index)
LEFT JOIN When INNER JOIN Suffices
LEFT JOIN preserves NULLs from customers, but WHERE c.status = 'active' filters NULLs out anyway — contradictory logic wastes planner effort
Slow Query
SELECT o.id, o.total_amount, c.name
FROM orders o
LEFT JOIN customers c ON c.id = o.customer_id
WHERE c.status = 'active';Before
520ms (Left Join + Filter NULLs)
After
380ms (Hash Join, smaller result set)
DELETE Without Batching
Single massive DELETE holds locks for minutes, bloats WAL, and can cause OOM with undo log
Slow Query
DELETE FROM audit_logs
WHERE created_at < '2023-01-01';
-- Deletes 30M rows in a single transactionBefore
8min (30M rows, WAL bloat, lock contention)
After
0.5s per 10K batch or instant DROP
UNION vs UNION ALL
UNION deduplicates by sorting/hashing the entire result set — expensive when duplicates are impossible or acceptable
Slow Query
SELECT name FROM customers_us
UNION
SELECT name FROM customers_eu;Before
1.2s (Sort + Unique on 2M rows)
After
180ms (Append, no sort)
CASE Inside Aggregate vs FILTER
Three CASE expressions evaluated per row — verbose and harder for the planner to optimize
Slow Query
SELECT
COUNT(CASE WHEN status = 'active' THEN 1 END) AS active,
COUNT(CASE WHEN status = 'inactive' THEN 1 END) AS inactive,
COUNT(CASE WHEN status = 'suspended' THEN 1 END) AS suspended
FROM users;Before
340ms (3 CASE evals per row)
After
290ms (FILTER, same speed but cleaner + optimizable)
Index-Only Scan Blocked by Visibility
Index Only Scan is available but falls back to regular Index Scan because the visibility map is stale (too many dead tuples)
Slow Query
-- Table has heavy UPDATE/DELETE activity
-- Index exists on (status, total_amount)
VACUUM; -- hasn't run in weeks
SELECT status, SUM(total_amount)
FROM orders
GROUP BY status;Before
2.1s (Index Scan + heap fetch per row)
After
380ms (Index Only Scan, no heap)
Cartesian Product from Missing JOIN Condition
No join condition between customers and products — creates a Cartesian product (1K customers × 10K products = 10M rows)
Slow Query
SELECT c.name, p.name AS product
FROM customers c, products p
WHERE c.city = 'New York';Before
45s (10M row cross product)
After
35ms (proper joins, ~2K rows)
Expression Index for Computed Filter
Wrapping created_at in DATE() prevents index usage — evaluates function on every row
Slow Query
SELECT * FROM events
WHERE DATE(created_at) = '2024-03-15';Before
1.8s (Seq Scan, DATE() per row)
After
3ms (Index Range Scan)
Aggregation on Joined Table Before JOIN
Three-way JOIN creates a massive intermediate result before aggregating — 100K customers × 5 orders × 3 items = 1.5M rows grouped
Slow Query
SELECT c.name,
COUNT(oi.id) AS total_items,
SUM(oi.quantity * oi.price) AS total_spent
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
GROUP BY c.id, c.name;Before
4.5s (1.5M row intermediate)
After
800ms (100K row intermediate)
LATERAL JOIN for Top-N Per Group
Correlated subquery in WHERE with LIMIT — hard for planner to optimize, often falls back to nested loop without index
Slow Query
-- Get latest 3 orders per customer
SELECT c.id, c.name, o.*
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.id IN (
SELECT o2.id FROM orders o2
WHERE o2.customer_id = c.id
ORDER BY o2.created_at DESC
LIMIT 3
);Before
8.5s (nested correlated subquery)
After
350ms (LATERAL + index scan per customer)
GIN Index for JSONB Queries
No index on JSONB fields — sequential scan parses JSON on every row
Slow Query
SELECT * FROM events
WHERE payload->>'event_type' = 'purchase'
AND (payload->>'amount')::numeric > 100;Before
3.5s (Seq Scan + JSON parse per row)
After
25ms (GIN Index Scan)
Excessive Indexes Slowing Writes
Every INSERT/UPDATE must maintain 12 indexes — write amplification makes inserts 10x slower than necessary
Slow Query
-- Table has 12 indexes including:
-- idx_orders_1 (customer_id)
-- idx_orders_2 (created_at)
-- idx_orders_3 (status)
-- idx_orders_4 (customer_id, created_at) -- supersedes idx_1
-- idx_orders_5 (status, created_at)
-- ... 7 more rarely-used indexes
INSERT INTO orders (customer_id, total_amount, status)
VALUES (42, 99.99, 'pending');Before
8ms per INSERT (12 index updates)
After
2ms per INSERT (5 index updates)
Row-Level Security Overhead
RLS adds a hidden filter evaluated per row — without index on customer_id, it triggers a Seq Scan even if status is indexed
Slow Query
-- RLS policy on orders:
-- USING (customer_id = current_setting('app.customer_id')::int)
SELECT SUM(total_amount)
FROM orders
WHERE status = 'completed';Before
1.9s (Seq Scan, RLS filter per row)
After
8ms (Index Only Scan covers both filters)
Recursive CTE for Hierarchy vs Closure Table
Recursive CTE walks the tree one level at a time — each iteration does an Index Scan. Deep trees (20+ levels) mean 20+ passes
Slow Query
-- Get all descendants of category id=5
WITH RECURSIVE tree AS (
SELECT id, name, parent_id, 0 AS depth
FROM categories WHERE id = 5
UNION ALL
SELECT c.id, c.name, c.parent_id, t.depth + 1
FROM categories c
JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree;Before
450ms (20 recursive iterations)
After
2ms (single index scan on closure table)
Connection Overhead from Short-Lived Queries
Connection setup (TCP handshake + SSL + auth) takes 10x longer than the actual query
Slow Query
-- Application opens new connection per request:
-- connect() → query → disconnect() (x1000/sec)
SELECT balance FROM accounts WHERE id = 42;
-- 2ms query, but 15ms connect + 5ms disconnect = 22ms totalBefore
22ms per query (15ms connect overhead)
After
2ms per query (pooled connection)
Bloom Index for Multi-Column Equality
No single B-tree index can serve arbitrary column combinations — creating all permutations is impractical
Slow Query
-- Users search by any combination of columns:
-- WHERE color = 'red' AND size = 'L'
-- WHERE material = 'cotton' AND brand = 'Nike'
-- 6 columns, would need 64 composite indexes
SELECT * FROM products
WHERE color = 'red' AND size = 'L' AND material = 'cotton';Before
1.4s (Seq Scan, no usable index)
After
45ms (Bloom Index Scan)
EXPLAIN ANALYZE on Write Query in Production
EXPLAIN ANALYZE actually EXECUTES the query — this UPDATE modifies 50K rows in production while profiling
Slow Query
-- Debugging a slow UPDATE in production:
EXPLAIN ANALYZE
UPDATE orders SET status = 'processed'
WHERE id BETWEEN 1 AND 50000;Before
Modifies 50K rows while profiling!
After
Zero data modification, same plan info
Nested Loop on Large Tables
Nested Loop does an index lookup per order row — fine for 100 rows, but 200K qualifying rows means 200K index lookups
Slow Query
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.total_amount > 500;
-- Planner chooses Nested Loop:
-- Seq Scan on orders (filter: total > 500)
-- → Index Scan on customers (per row)Before
4.2s (200K Nested Loop index lookups)
After
380ms (single Hash Join pass)
Unnecessary ORDER BY in Subquery
ORDER BY in the subquery is pointless — the outer query re-sorts by a different column anyway
Slow Query
SELECT category_id, avg_price
FROM (
SELECT category_id, AVG(price) AS avg_price
FROM products
GROUP BY category_id
ORDER BY avg_price DESC
) sub
WHERE avg_price > 100
ORDER BY category_id;Before
210ms (double sort: inner + outer)
After
130ms (single sort)
SELECT FOR UPDATE Lock Scope
FOR UPDATE locks every row in the result set — blocks other workers from processing any pending order
Slow Query
-- Processing pending orders one at a time
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at
FOR UPDATE;
-- Locks ALL pending orders while processing oneBefore
Sequential processing (1 worker)
After
Parallel processing (N workers, no contention)
GiST Index for Range Queries
Two-sided range comparison can't be served by a single B-tree index — at best uses one side, then filters
Slow Query
-- Find all events overlapping a time range
SELECT * FROM events
WHERE start_time <= '2024-03-15 18:00'
AND end_time >= '2024-03-15 09:00';Before
1.8s (Seq Scan, two-sided filter)
After
12ms (GiST Index Scan, native overlap)
Unnecessary COALESCE Preventing Index
COALESCE wraps indexed columns in an expression — no index can match, full table scan required
Slow Query
SELECT * FROM products
WHERE COALESCE(discount_price, price) < 50;Before
680ms (Seq Scan, COALESCE per row)
After
35ms (Bitmap OR of two Index Scans)
Unbounded SELECT Without LIMIT
No LIMIT clause — returns entire result set, consuming memory on both database and application side
Slow Query
-- API endpoint: GET /api/orders?status=pending
SELECT * FROM orders
WHERE status = 'pending';
-- Returns 150K rows to the applicationBefore
3.2s (150K rows transferred, 45MB)
After
8ms (50 rows, 0.015MB)
Table Partitioning for Time-Series Data
Single monolithic table — index scans still touch a huge B-tree, VACUUM takes hours, index bloat compounds
Slow Query
-- 500M rows in a single table, growing daily
SELECT COUNT(*), SUM(amount)
FROM transactions
WHERE created_at BETWEEN '2024-03-01' AND '2024-03-31';Before
28s (Index Scan on 500M-row B-tree)
After
1.8s (Seq Scan on 15M-row partition)
Parallel Query Underutilization
Parallel query disabled or under-configured — large aggregation runs on a single CPU core while 15 cores idle
Slow Query
-- Server has 16 cores but query uses only 1
SET max_parallel_workers_per_gather = 0;
SELECT COUNT(*), AVG(total_amount)
FROM orders
WHERE created_at > '2024-01-01';Before
4.8s (single-core Seq Scan)
After
1.2s (4 parallel workers)