Query Optimization

Learn to write efficient SQL by identifying bottlenecks, reading EXPLAIN plans, designing indexes, and rewriting slow queries. Practice with real-world optimization challenges.

easy

Missing Index on Filter Column

IndexWHERE optimizationB-tree

Missing index usage — full table scan on 500K rows

Slow Query

SELECT * FROM orders
WHERE customer_id = 12345
ORDER BY created_at DESC;

Before

2.3s (Seq Scan, 500K rows)

After

4ms (Index Scan, 38 rows)

easy

The SELECT * Anti-Pattern

SELECT optimizationI/O reductionCovering Index

Unnecessary columns fetched — table has 40 columns including large TEXT/BLOB fields

Slow Query

SELECT *
FROM products
WHERE category_id = 5
ORDER BY price DESC
LIMIT 20;

Before

180ms (reads 40 cols, 6.2MB)

After

12ms (reads 3 cols, 0.4MB)

medium

Correlated Subquery N+1 Problem

JOIN vs SubqueryN+1 patternGROUP BY

N+1 query pattern — correlated subquery executes once per row (100K customers = 100K subqueries)

Slow Query

SELECT c.name,
  (SELECT COUNT(*)
   FROM orders o
   WHERE o.customer_id = c.id) AS order_count
FROM customers c
WHERE c.status = 'active';

Before

12.8s (100K subqueries)

After

340ms (single Hash Join + Aggregate)

medium

Function on Indexed Column

WHERE optimizationIndexSargable predicates

Function wrapping indexed column — prevents index usage, forces full table scan

Slow Query

SELECT *
FROM transactions
WHERE EXTRACT(YEAR FROM created_at) = 2024
  AND EXTRACT(MONTH FROM created_at) = 3;

Before

3.1s (Seq Scan on 2M rows)

After

25ms (Index Range Scan)

medium

N+1 Query from Application Code

N+1 patternJOIN vs SubqueryRound-trip reduction

N+1 query pattern — 501 database round-trips instead of 1

Slow Query

-- Query 1: fetch orders
SELECT id, total FROM orders
WHERE customer_id = 42 LIMIT 500;

-- Then for EACH order (x500 round-trips):
-- SELECT * FROM order_items
--   WHERE order_id = ?;

Before

4.2s (501 queries, network overhead)

After

45ms (1 query, Hash Join)

hard

DISTINCT on Large JOIN

JOIN vs SubqueryEXISTS optimizationDeduplication

DISTINCT after JOIN builds massive result set then deduplicates — O(n*m) then sort

Slow Query

SELECT DISTINCT c.id, c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id;

Before

2.8s (2.3M intermediate rows, then sort)

After

180ms (semi-join, early exit)

medium

OR Preventing Index Usage

WHERE optimizationUNION ALLIndex

OR between differently-indexed columns forces sequential scan or expensive bitmap merge

Slow Query

SELECT id, name, email
FROM users
WHERE email = 'test@example.com'
   OR phone = '+1234567890';

Before

890ms (Seq Scan on 5M rows)

After

2ms (two Index Scans)

hard

Pagination with OFFSET

PaginationCursor-basedOFFSET anti-pattern

OFFSET scans and discards 20,000 rows — gets slower on every page

Slow Query

SELECT id, title, created_at
FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 20000;

Before

8.2s on page 1000 (scans 20K rows)

After

3ms on any page (reads 20 rows)

hard

Covering Index for Aggregation

IndexCovering IndexIndex Only Scan

Reads entire wide table (20 columns, 2M rows) when only 2 columns are needed for the aggregation

Slow Query

SELECT status,
  COUNT(*) AS cnt,
  SUM(total_amount) AS revenue
FROM orders
GROUP BY status;

Before

1.4s (Seq Scan, reads 2M wide rows)

After

120ms (Index Only Scan, compact)

hard

Composite Index Column Order

IndexComposite IndexSort elimination

Two separate indexes force bitmap merge + sort instead of a single efficient scan

Slow Query

-- Two separate indexes exist:
-- idx_tickets_status (status)
-- idx_tickets_created (created_at)

SELECT id, title, status, created_at
FROM tickets
WHERE status = 'open'
ORDER BY created_at DESC
LIMIT 50;

Before

650ms (Bitmap Scan + Sort)

After

8ms (Index Scan Backward, pre-sorted)

easy

Implicit Type Casting Kills Index

WHERE optimizationType castingIndex

phone is VARCHAR but compared to an integer — PostgreSQL casts every row's phone to numeric, preventing index usage

Slow Query

SELECT * FROM users
WHERE phone = 5551234567;

Before

1.9s (Seq Scan, implicit cast per row)

After

0.3ms (Index Scan, direct match)

easy

LIKE with Leading Wildcard

LIKE optimizationIndexFunctional Index

Leading wildcard % prevents B-tree index usage — full table scan on every row

Slow Query

SELECT id, name, email
FROM customers
WHERE email LIKE '%@gmail.com';

Before

2.1s (Seq Scan 3M rows)

After

15ms (Index Scan)

easy

COUNT(*) on Entire Table

AggregationTable statisticsApproximate counts

PostgreSQL must do a full table scan for exact COUNT(*) — MVCC means no stored row count

Slow Query

SELECT COUNT(*)
FROM logs;
-- logs table has 50M rows

Before

28s (Seq Scan 50M rows)

After

0.1ms (catalog lookup)

medium

NOT IN with NULL Trap

EXISTS optimizationNULL handlingAnti-join

NOT IN returns no rows if the subquery contains any NULL — also prevents index usage and builds a full hash

Slow Query

SELECT id, name
FROM products
WHERE id NOT IN (
  SELECT product_id FROM order_items
);

Before

4.2s (Hash Anti Join, NULL risk)

After

180ms (Nested Loop Anti Join, index-backed)

easy

Unnecessary DISTINCT Hiding a JOIN Bug

EXISTS optimizationDeduplicationJOIN vs Subquery

DISTINCT masks a many-to-many explosion — 100K customers × 5 orders × 3 items = 1.5M rows sorted and deduplicated

Slow Query

SELECT DISTINCT c.id, c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id;

Before

5.1s (1.5M rows → sort → unique)

After

120ms (semi-join, early exit)

medium

Partial Index for Hot Path

IndexPartial IndexSelective indexing

Full index on status includes 95% completed rows that are never queried on this path — bloated index, more I/O

Slow Query

-- 95% of orders are 'completed', only 2% are 'pending'
-- Full index exists: idx_orders_status (status)

SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at;

Before

320ms (scans 2M-entry index)

After

5ms (scans 40K-entry partial index)

hard

Window Function Recomputing on Every Row

Window FunctionsN+1 patternPartition optimization

Correlated subquery recalculates the average for each row — same average recomputed 500 times per category

Slow Query

SELECT *,
  (SELECT AVG(price) FROM products p2
   WHERE p2.category_id = p.category_id) AS avg_category_price
FROM products p;

Before

3.8s (correlated subquery per row)

After

45ms (single WindowAgg pass)

hard

Expensive CTE Evaluated Multiple Times

CTE optimizationFILTER clauseSingle-pass aggregation

In PostgreSQL <12, CTEs are optimization fences — the aggregation runs 3 times. In 12+, the planner may still not inline complex CTEs

Slow Query

WITH order_stats AS (
  SELECT customer_id,
    COUNT(*) AS cnt,
    SUM(total_amount) AS revenue
  FROM orders
  GROUP BY customer_id
)
SELECT 'high_value' AS segment, COUNT(*) FROM order_stats WHERE revenue > 10000
UNION ALL
SELECT 'medium_value', COUNT(*) FROM order_stats WHERE revenue BETWEEN 1000 AND 10000
UNION ALL
SELECT 'low_value', COUNT(*) FROM order_stats WHERE revenue < 1000;

Before

2.4s (3x aggregation passes)

After

420ms (single pass + conditional counts)

medium

UPDATE Without Index on WHERE

UPDATE optimizationBatch processingLock contention

No index on (status, created_at) — UPDATE does a sequential scan locking rows as it goes, blocking other queries

Slow Query

UPDATE orders
SET status = 'archived'
WHERE created_at < '2022-01-01'
  AND status = 'completed';

Before

45s (Seq Scan + 800K row locks)

After

200ms per batch (Index Scan + 5K locks)

medium

Excessive JOIN Columns in GROUP BY

GROUP BYPrimary Key optimizationHash memory

Grouping by 7 columns creates a huge hash key — slow hashing and high memory usage

Slow Query

SELECT c.id, c.name, c.email, c.phone,
  c.address, c.city, c.state,
  COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name, c.email, c.phone,
  c.address, c.city, c.state;

Before

1.8s (HashAggregate with 7-col key)

After

650ms (HashAggregate with 1-col key)

easy

Sorting Without Index Support

Sort eliminationIndexLIMIT optimization

No index on created_at — sorts entire 500K-row table in memory just to return 10 rows

Slow Query

SELECT id, name, created_at
FROM products
ORDER BY created_at DESC
LIMIT 10;

Before

420ms (Seq Scan + Sort 500K rows)

After

0.5ms (Index Scan, 10 rows)

hard

Materialized View for Dashboard Queries

Materialized ViewCachingDashboard optimization

Aggregates 5M rows on every request — 50 identical expensive queries per second

Slow Query

-- Dashboard query runs every page load (50 req/sec)
SELECT DATE_TRUNC('day', created_at) AS day,
  COUNT(*) AS orders,
  SUM(total_amount) AS revenue,
  AVG(total_amount) AS avg_order
FROM orders
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1;

Before

3.2s per request × 50 rps

After

2ms per request (pre-computed)

easy

HAVING vs WHERE Placement

WHERE optimizationHAVING vs WHEREFilter pushdown

HAVING filters after aggregation — groups ALL categories first, then discards most results

Slow Query

SELECT category_id, COUNT(*) AS cnt
FROM products
GROUP BY category_id
HAVING category_id IN (1, 2, 3);

Before

180ms (groups all 200 categories)

After

12ms (groups only 3 categories)

medium

Redundant Subquery in FROM

Subquery simplificationDead column eliminationHAVING

Subquery computes MAX(created_at) for every customer but it's never used — wasted CPU and memory

Slow Query

SELECT sub.customer_id, sub.total_orders
FROM (
  SELECT customer_id, COUNT(*) AS total_orders,
    MAX(created_at) AS last_order
  FROM orders
  GROUP BY customer_id
) sub
WHERE sub.total_orders > 5;

Before

680ms (computes unused MAX per group)

After

410ms (aggregate only what's needed)

hard

Multi-Column IN vs EXISTS

EXISTS optimizationMulti-column lookupSemi-join

Multi-column IN builds a full hash set of all alert pairs — can't leverage indexes efficiently on composite check

Slow Query

SELECT * FROM inventory
WHERE (warehouse_id, product_id) IN (
  SELECT warehouse_id, product_id
  FROM stock_alerts
  WHERE alert_type = 'low_stock'
);

Before

1.6s (Hash Semi Join, full materialization)

After

85ms (Nested Loop Semi Join, index-backed)

medium

String Concatenation in WHERE

Sargable predicatesWHERE optimizationExpression index

Concatenation expression on every row prevents any index usage — full table scan

Slow Query

SELECT * FROM customers
WHERE first_name || ' ' || last_name = 'John Smith';

Before

1.2s (Seq Scan, concat per row)

After

0.4ms (Index Scan on composite index)

easy

LEFT JOIN When INNER JOIN Suffices

JOIN optimizationLEFT vs INNERQuery semantics

LEFT JOIN preserves NULLs from customers, but WHERE c.status = 'active' filters NULLs out anyway — contradictory logic wastes planner effort

Slow Query

SELECT o.id, o.total_amount, c.name
FROM orders o
LEFT JOIN customers c ON c.id = o.customer_id
WHERE c.status = 'active';

Before

520ms (Left Join + Filter NULLs)

After

380ms (Hash Join, smaller result set)

hard

DELETE Without Batching

DELETE optimizationBatch processingTable partitioning

Single massive DELETE holds locks for minutes, bloats WAL, and can cause OOM with undo log

Slow Query

DELETE FROM audit_logs
WHERE created_at < '2023-01-01';
-- Deletes 30M rows in a single transaction

Before

8min (30M rows, WAL bloat, lock contention)

After

0.5s per 10K batch or instant DROP

easy

UNION vs UNION ALL

UNION ALLDeduplicationSort elimination

UNION deduplicates by sorting/hashing the entire result set — expensive when duplicates are impossible or acceptable

Slow Query

SELECT name FROM customers_us
UNION
SELECT name FROM customers_eu;

Before

1.2s (Sort + Unique on 2M rows)

After

180ms (Append, no sort)

medium

CASE Inside Aggregate vs FILTER

FILTER clauseConditional aggregationPostgreSQL-specific

Three CASE expressions evaluated per row — verbose and harder for the planner to optimize

Slow Query

SELECT
  COUNT(CASE WHEN status = 'active' THEN 1 END) AS active,
  COUNT(CASE WHEN status = 'inactive' THEN 1 END) AS inactive,
  COUNT(CASE WHEN status = 'suspended' THEN 1 END) AS suspended
FROM users;

Before

340ms (3 CASE evals per row)

After

290ms (FILTER, same speed but cleaner + optimizable)

hard

Index-Only Scan Blocked by Visibility

VACUUMVisibility mapIndex Only ScanAutovacuum

Index Only Scan is available but falls back to regular Index Scan because the visibility map is stale (too many dead tuples)

Slow Query

-- Table has heavy UPDATE/DELETE activity
-- Index exists on (status, total_amount)

VACUUM; -- hasn't run in weeks

SELECT status, SUM(total_amount)
FROM orders
GROUP BY status;

Before

2.1s (Index Scan + heap fetch per row)

After

380ms (Index Only Scan, no heap)

easy

Cartesian Product from Missing JOIN Condition

Cartesian productJOIN optimizationQuery correctness

No join condition between customers and products — creates a Cartesian product (1K customers × 10K products = 10M rows)

Slow Query

SELECT c.name, p.name AS product
FROM customers c, products p
WHERE c.city = 'New York';

Before

45s (10M row cross product)

After

35ms (proper joins, ~2K rows)

medium

Expression Index for Computed Filter

Sargable predicatesExpression indexDate optimization

Wrapping created_at in DATE() prevents index usage — evaluates function on every row

Slow Query

SELECT * FROM events
WHERE DATE(created_at) = '2024-03-15';

Before

1.8s (Seq Scan, DATE() per row)

After

3ms (Index Range Scan)

hard

Aggregation on Joined Table Before JOIN

Pre-aggregationJOIN optimizationIntermediate result reduction

Three-way JOIN creates a massive intermediate result before aggregating — 100K customers × 5 orders × 3 items = 1.5M rows grouped

Slow Query

SELECT c.name,
  COUNT(oi.id) AS total_items,
  SUM(oi.quantity * oi.price) AS total_spent
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
GROUP BY c.id, c.name;

Before

4.5s (1.5M row intermediate)

After

800ms (100K row intermediate)

hard

LATERAL JOIN for Top-N Per Group

LATERAL JOINTop-N per groupCorrelated subquery

Correlated subquery in WHERE with LIMIT — hard for planner to optimize, often falls back to nested loop without index

Slow Query

-- Get latest 3 orders per customer
SELECT c.id, c.name, o.*
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.id IN (
  SELECT o2.id FROM orders o2
  WHERE o2.customer_id = c.id
  ORDER BY o2.created_at DESC
  LIMIT 3
);

Before

8.5s (nested correlated subquery)

After

350ms (LATERAL + index scan per customer)

medium

GIN Index for JSONB Queries

GIN IndexJSONBNoSQL in SQL

No index on JSONB fields — sequential scan parses JSON on every row

Slow Query

SELECT * FROM events
WHERE payload->>'event_type' = 'purchase'
  AND (payload->>'amount')::numeric > 100;

Before

3.5s (Seq Scan + JSON parse per row)

After

25ms (GIN Index Scan)

medium

Excessive Indexes Slowing Writes

Index maintenanceWrite optimizationIndex bloat

Every INSERT/UPDATE must maintain 12 indexes — write amplification makes inserts 10x slower than necessary

Slow Query

-- Table has 12 indexes including:
-- idx_orders_1 (customer_id)
-- idx_orders_2 (created_at)
-- idx_orders_3 (status)
-- idx_orders_4 (customer_id, created_at)  -- supersedes idx_1
-- idx_orders_5 (status, created_at)
-- ... 7 more rarely-used indexes

INSERT INTO orders (customer_id, total_amount, status)
VALUES (42, 99.99, 'pending');

Before

8ms per INSERT (12 index updates)

After

2ms per INSERT (5 index updates)

hard

Row-Level Security Overhead

Row-Level SecurityHidden filtersComposite Index

RLS adds a hidden filter evaluated per row — without index on customer_id, it triggers a Seq Scan even if status is indexed

Slow Query

-- RLS policy on orders:
-- USING (customer_id = current_setting('app.customer_id')::int)

SELECT SUM(total_amount)
FROM orders
WHERE status = 'completed';

Before

1.9s (Seq Scan, RLS filter per row)

After

8ms (Index Only Scan covers both filters)

hard

Recursive CTE for Hierarchy vs Closure Table

Recursive CTEClosure tableHierarchy optimization

Recursive CTE walks the tree one level at a time — each iteration does an Index Scan. Deep trees (20+ levels) mean 20+ passes

Slow Query

-- Get all descendants of category id=5
WITH RECURSIVE tree AS (
  SELECT id, name, parent_id, 0 AS depth
  FROM categories WHERE id = 5
  UNION ALL
  SELECT c.id, c.name, c.parent_id, t.depth + 1
  FROM categories c
  JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree;

Before

450ms (20 recursive iterations)

After

2ms (single index scan on closure table)

medium

Connection Overhead from Short-Lived Queries

Connection poolingPgBouncerLatency reduction

Connection setup (TCP handshake + SSL + auth) takes 10x longer than the actual query

Slow Query

-- Application opens new connection per request:
-- connect() → query → disconnect() (x1000/sec)

SELECT balance FROM accounts WHERE id = 42;
-- 2ms query, but 15ms connect + 5ms disconnect = 22ms total

Before

22ms per query (15ms connect overhead)

After

2ms per query (pooled connection)

hard

Bloom Index for Multi-Column Equality

Bloom IndexMulti-column filterIndex strategy

No single B-tree index can serve arbitrary column combinations — creating all permutations is impractical

Slow Query

-- Users search by any combination of columns:
-- WHERE color = 'red' AND size = 'L'
-- WHERE material = 'cotton' AND brand = 'Nike'
-- 6 columns, would need 64 composite indexes

SELECT * FROM products
WHERE color = 'red' AND size = 'L' AND material = 'cotton';

Before

1.4s (Seq Scan, no usable index)

After

45ms (Bloom Index Scan)

medium

EXPLAIN ANALYZE on Write Query in Production

EXPLAIN safetyProduction debuggingTransaction rollback

EXPLAIN ANALYZE actually EXECUTES the query — this UPDATE modifies 50K rows in production while profiling

Slow Query

-- Debugging a slow UPDATE in production:
EXPLAIN ANALYZE
UPDATE orders SET status = 'processed'
WHERE id BETWEEN 1 AND 50000;

Before

Modifies 50K rows while profiling!

After

Zero data modification, same plan info

hard

Nested Loop on Large Tables

Join strategywork_memHash JoinPlanner hints

Nested Loop does an index lookup per order row — fine for 100 rows, but 200K qualifying rows means 200K index lookups

Slow Query

SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.total_amount > 500;

-- Planner chooses Nested Loop:
-- Seq Scan on orders (filter: total > 500)
--   → Index Scan on customers (per row)

Before

4.2s (200K Nested Loop index lookups)

After

380ms (single Hash Join pass)

easy

Unnecessary ORDER BY in Subquery

Sort eliminationSubquery simplificationQuery rewrite

ORDER BY in the subquery is pointless — the outer query re-sorts by a different column anyway

Slow Query

SELECT category_id, avg_price
FROM (
  SELECT category_id, AVG(price) AS avg_price
  FROM products
  GROUP BY category_id
  ORDER BY avg_price DESC
) sub
WHERE avg_price > 100
ORDER BY category_id;

Before

210ms (double sort: inner + outer)

After

130ms (single sort)

hard

SELECT FOR UPDATE Lock Scope

LockingSKIP LOCKEDConcurrent processingQueue pattern

FOR UPDATE locks every row in the result set — blocks other workers from processing any pending order

Slow Query

-- Processing pending orders one at a time
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at
FOR UPDATE;
-- Locks ALL pending orders while processing one

Before

Sequential processing (1 worker)

After

Parallel processing (N workers, no contention)

hard

GiST Index for Range Queries

GiST IndexRange typesOverlap queries

Two-sided range comparison can't be served by a single B-tree index — at best uses one side, then filters

Slow Query

-- Find all events overlapping a time range
SELECT * FROM events
WHERE start_time <= '2024-03-15 18:00'
  AND end_time >= '2024-03-15 09:00';

Before

1.8s (Seq Scan, two-sided filter)

After

12ms (GiST Index Scan, native overlap)

medium

Unnecessary COALESCE Preventing Index

Sargable predicatesCOALESCEExpression index

COALESCE wraps indexed columns in an expression — no index can match, full table scan required

Slow Query

SELECT * FROM products
WHERE COALESCE(discount_price, price) < 50;

Before

680ms (Seq Scan, COALESCE per row)

After

35ms (Bitmap OR of two Index Scans)

easy

Unbounded SELECT Without LIMIT

LIMITAPI safetyMemory management

No LIMIT clause — returns entire result set, consuming memory on both database and application side

Slow Query

-- API endpoint: GET /api/orders?status=pending
SELECT * FROM orders
WHERE status = 'pending';
-- Returns 150K rows to the application

Before

3.2s (150K rows transferred, 45MB)

After

8ms (50 rows, 0.015MB)

hard

Table Partitioning for Time-Series Data

Table partitioningPartition pruningTime-series

Single monolithic table — index scans still touch a huge B-tree, VACUUM takes hours, index bloat compounds

Slow Query

-- 500M rows in a single table, growing daily
SELECT COUNT(*), SUM(amount)
FROM transactions
WHERE created_at BETWEEN '2024-03-01' AND '2024-03-31';

Before

28s (Index Scan on 500M-row B-tree)

After

1.8s (Seq Scan on 15M-row partition)

hard

Parallel Query Underutilization

Parallel queryMulti-coreConfiguration tuning

Parallel query disabled or under-configured — large aggregation runs on a single CPU core while 15 cores idle

Slow Query

-- Server has 16 cores but query uses only 1
SET max_parallel_workers_per_gather = 0;

SELECT COUNT(*), AVG(total_amount)
FROM orders
WHERE created_at > '2024-01-01';

Before

4.8s (single-core Seq Scan)

After

1.2s (4 parallel workers)