Database Optimization — Interview Q&A Guide

This guide covers database performance topics for backend interviews: query optimization, indexing, schema design, caching, partitioning, sharding, replication, and operating large tables. Each question pairs the explanation with code, configuration, or query plans.

Query Optimization Fundamentals

Q1. What is the first rule of database optimization?

Answer:

Measure first, then optimize. Use EXPLAIN ANALYZE (Postgres) or EXPLAIN FORMAT=JSON (MySQL) to see the actual cost. Adding an index without confirming the planner needs one is guesswork.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped';

Output tells you whether it picked an index scan, sequential scan, or hash join, plus rows scanned vs returned. Optimization decisions follow the data, not intuition.

Q2. What are typical latency budgets for a single query?

Answer:

A user-facing API endpoint should not exceed about 200 ms total DB time at p95. Breaking it down:

Query type	Acceptable	Investigate	Critical
Primary key lookup	< 1 ms	> 5 ms	> 20 ms
Indexed read	< 5 ms	> 20 ms	> 100 ms
Aggregation (small)	< 50 ms	> 200 ms	> 1 s
Reporting query	< 5 s	> 30 s	move to OLAP

If a PK lookup takes 50 ms, suspect connection latency, network, or cold cache, not the query itself.

Q3. What are the most common reasons a query is slow?

Answer:

Most slow queries fall into a small set of patterns:

Missing or stale index
N+1 queries from the application
SELECT * on wide tables
Implicit type cast preventing index use
LIKE '%term%' (leading wildcard)
Function applied to a column without an expression index
Large OFFSET for pagination
Stale planner statistics
Lock contention
Buffer pool eviction (cold data)

Example of a hidden killer:

-- "id" is INTEGER but bound as string -> implicit cast -> seq scan
SELECT * FROM users WHERE id = '42';

Q4. What is the difference between a logical query plan and a physical plan?

Answer:

The logical plan describes what the query computes (joins, projections, filters). The physical plan describes how the engine will execute it: index scan vs sequential scan, hash join vs nested loop, parallel workers, sort operations.

The optimizer picks the physical plan based on table statistics. After bulk loads, statistics get stale and plans become bad.

-- Postgres: refresh stats
ANALYZE orders;

-- MySQL
ANALYZE TABLE orders;

Q5. Why does `SELECT *` hurt performance?

Answer:

SELECT * reads every column whether you need it or not. That means more bytes over the wire, more memory in result buffers, and you cannot use a covering index (the planner is forced to visit the heap).

-- BAD: pulls all columns
SELECT * FROM orders WHERE user_id = 42;

-- GOOD: only what the API actually returns
SELECT id, total_amount, status, created_at
FROM orders
WHERE user_id = 42;

For wide tables with TEXT or JSONB columns, the difference can be 10x in I/O and bandwidth.

Indexes

Q6. What does a B-tree index actually do?

Answer:

A B-tree is a balanced tree of sorted keys with pointers to rows. Lookups, range scans, and ordered scans are O(log n). The leaves are linked, so range scans walk leaves directly.

CREATE INDEX idx_users_email ON users(email);

A 100 million row table typically has a tree depth of 4 to 5 levels. With a warm cache, those reads are in memory, so a lookup costs microseconds.

Q7. When is an index NOT used by the planner?

Answer:

Common cases where the index gets bypassed:

-- Function on column - index unusable
SELECT * FROM users WHERE LOWER(email) = 'me@x.com';

-- Fix with expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Leading wildcard - cannot use B-tree
SELECT * FROM products WHERE name LIKE '%phone';

-- Type mismatch
SELECT * FROM users WHERE id = '42';   -- if id is INTEGER

-- Negation
SELECT * FROM orders WHERE status != 'paid';

-- OR across columns from different indexes
SELECT * FROM users WHERE email = ? OR phone = ?;

Q8. What is a covering index?

Answer:

A covering index includes every column the query needs, so the engine never visits the heap. Postgres calls this an "index-only scan."

-- Postgres: include extra columns without making them part of key
CREATE INDEX idx_orders_user_status ON orders (user_id, status)
INCLUDE (total_amount);

-- MySQL: just put them in the index
CREATE INDEX idx_orders_user_status ON orders (user_id, status, total_amount);

Query that benefits:

SELECT total_amount FROM orders
WHERE user_id = 42 AND status = 'shipped';

The downside: bigger index, slower writes.

Q9. What is a partial index?

Answer:

A partial index covers only a subset of rows that match a WHERE clause. Useful when most rows are inactive but queries always filter to the small active set.

-- Only index active orders (much smaller)
CREATE INDEX idx_active_orders ON orders (created_at)
WHERE status = 'active';

-- Soft delete pattern
CREATE INDEX idx_users_email ON users (email)
WHERE deleted_at IS NULL;

If only 5 percent of rows match, the partial index is 20x smaller, fits in cache, and scans faster.

Q10. How do you choose composite index column order?

Answer:

Rule: equality columns first, range columns last, sort columns aligned with the order direction.

For this query:

SELECT * FROM events
WHERE tenant_id = 7
  AND type = 'click'
  AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;

The right index is:

CREATE INDEX idx_events_tenant_type_created
ON events (tenant_id, type, created_at DESC);

The planner seeks to (7, 'click'), then walks leaves backwards by created_at. No extra sort step.

Q11. What is index bloat and how do you fix it?

Answer:

In Postgres, dead tuples in indexes do not auto-shrink. After heavy update or delete cycles the index becomes 3 to 5 times larger than necessary, causing slower scans and memory pressure.

Detect:

SELECT schemaname, relname, indexrelname,
       pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Fix without downtime:

REINDEX INDEX CONCURRENTLY idx_users_email;
-- Or use pg_repack from the command line

In MySQL InnoDB, OPTIMIZE TABLE users; rebuilds and reclaims space.

Q12. Hash index vs B-tree?

Answer:

Aspect	B-tree	Hash
Equality	Yes	Yes (slightly faster)
Range queries	Yes	No
Sorting (ORDER BY)	Yes	No
Real-world use	99% of cases	Rare

Default to B-tree. Hash indexes only help when the workload is purely equality on a high-cardinality column with no ordering needs.

Q13. GIN vs GiST in Postgres?

Answer:

GIN (Generalized Inverted Index) is best for full-text search, JSONB key lookups, and array contains. Slower writes, fast reads.

GiST (Generalized Search Tree) is best for geospatial, range types, and fuzzy matching. Faster writes, slower reads than GIN.

-- JSONB key lookup
CREATE INDEX idx_payload_gin ON events USING GIN (payload);

-- Full-text
CREATE INDEX idx_doc_fts ON documents
USING GIN (to_tsvector('english', body));

-- Geospatial
CREATE INDEX idx_location ON places USING GIST (location);

Q14. Why is having too many indexes bad?

Answer:

Every index must be updated on every insert, update (to indexed columns), and delete. Eight indexes on a hot write table can make inserts 3 to 5 times slower. They also consume disk and buffer pool memory.

Audit unused indexes:

SELECT schemaname, relname, indexrelname, idx_scan,
       pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

If an index has zero scans for a month under representative load, drop it.

EXPLAIN and Query Plans

Q15. How do you read a Postgres `EXPLAIN ANALYZE` output?

Answer:

Each line shows a plan node with its estimated cost, estimated rows, actual time, and actual rows. The interesting numbers are actual time, actual rows, and Buffers.

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42;

Sample output:

Index Scan using idx_orders_user on orders
  (cost=0.42..8.44 rows=10 width=64)
  (actual time=0.012..0.145 rows=12 loops=1)
  Buffers: shared hit=4
Planning Time: 0.123 ms
Execution Time: 0.180 ms

Watch for:

A big gap between estimated and actual rows means stale stats. Run ANALYZE.
Seq Scan on a large table means a missing index.
Sort with Disk: means work_mem is too low.

Q16. What are the common Postgres plan node types?

Answer:

Node	When you see it
Seq Scan	Reads whole table — good for small tables, bad for large
Index Scan	Uses index, then reads heap
Index Only Scan	Best — needs covering index + visibility map
Bitmap Index Scan	Combines multiple indexes
Nested Loop	For each outer row, scan inner — good with small outer
Hash Join	Builds hash on one side — good for unsorted equi-join
Merge Join	Both sides sorted
Sort	Explicit sort — try to align index with `ORDER BY`

Q17. How do you read MySQL EXPLAIN output?

Answer:

Focus on the type column. From best to worst: const < eq_ref < ref < range < index < ALL. ALL means a full table scan.

EXPLAIN SELECT * FROM orders WHERE user_id = 42;

id	type	key	rows	Extra
1	ref	idx_user_id	12	Using where

Goals:

Avoid type: ALL
Avoid Extra: Using filesort on large data
Avoid Extra: Using temporary
Prefer Extra: Using index (covering index)

For more detail, use EXPLAIN FORMAT=JSON.

N+1 and JOIN Strategies

Q18. What is the N+1 problem and how do you detect it?

Answer:

For each parent row, the ORM fires a separate child query. One parent query plus N child queries equals N+1 round trips.

Bad pattern:

$users = User::all();              // 1 query
foreach ($users as $user) {
    echo $user->posts->count();    // N queries
}

Fix with eager loading:

$users = User::withCount('posts')->get();  // 1 query

Detection:

Laravel: enable Model::preventLazyLoading() in development
Rails: use the bullet gem
APM tools group similar repeated queries

Q19. When should you use Nested Loop vs Hash Join vs Merge Join?

Answer:

Strategy	Best when	Cost
Nested Loop	Outer is small, inner has index	O(N × log M)
Hash Join	Both sides unsorted, equi-join	O(N + M), uses memory
Merge Join	Both sides pre-sorted	O(N + M)

The planner picks based on stats. Force a different plan in MySQL with STRAIGHT_JOIN, in Postgres rewrite the query or use pg_hint_plan.

Q20. When should you denormalize a join?

Answer:

Denormalize when:

The join is on a hot read path with a strict latency budget (under 50 ms)
Reference data changes rarely, like a copy of user_name on orders
The join would otherwise span shards

Example: instead of joining orders and users on every read, copy user_name to orders at insert time.

-- Schema with denormalized name
CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT,
  user_name VARCHAR(100),  -- copied from users
  total NUMERIC(10,2),
  created_at TIMESTAMPTZ
);

The trade-off: writes must keep both copies in sync. Use a background reconciliation job to detect drift.

Q21. EXISTS vs IN vs JOIN — when to use which?

Answer:

-- "Does the user have any orders?" -> EXISTS short-circuits
SELECT u.id FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- "How many orders does each user have?" -> JOIN
SELECT u.id, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;

-- "Which users have ordered specific items?" -> IN or JOIN
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE product_id = 5);

Postgres rewrites most IN (subquery) into semi-joins, so performance is similar. MySQL historically prefers EXISTS.

Q22. How do you optimize an OR query?

Answer:

OR across different indexed columns often forces a full scan because the planner cannot combine indexes well.

-- Slow
SELECT * FROM users WHERE email = ? OR phone = ?;

-- Fast: rewrite as UNION ALL
SELECT * FROM users WHERE email = ?
UNION ALL
SELECT * FROM users WHERE phone = ? AND email <> ?;

In Postgres the planner often handles this via Bitmap Or, so check the plan first before rewriting.

Schema Optimization

Q23. How do data types affect performance?

Answer:

Smaller types mean more rows fit per page, which means better cache hit rates.

Type	Size	Notes
INT	4 bytes	Use unless you exceed 2 billion
BIGINT	8 bytes	For massive ID ranges
UUID (binary)	16 bytes	Better than 36-char string
VARCHAR(N)	length	In MySQL, can move off-page if very long

Switching a 1-billion-row table from BIGINT to INT saves 4 GB per indexed column.

-- BAD: storing UUIDs as text in MySQL
user_id CHAR(36)

-- GOOD
user_id BINARY(16)

Q24. What is the difference between wide rows and narrow rows?

Answer:

A row that exceeds the page size (8 KB Postgres, 16 KB InnoDB) gets split off-page (TOAST in Postgres). The main row stores a pointer, and reading the large field requires extra I/O.

Strategies:

-- BAD: large blobs inline with hot data
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  price NUMERIC(10,2),
  description TEXT,        -- can be 100 KB
  full_html TEXT           -- can be 1 MB
);

-- GOOD: split heavy fields
CREATE TABLE products (id, name, price);
CREATE TABLE product_content (
  product_id BIGINT PRIMARY KEY REFERENCES products(id),
  description TEXT,
  full_html TEXT
);

Q25. UUID vs auto-increment IDs?

Answer:

Aspect	INT/BIGINT auto	UUID v4	UUID v7
Insert speed	Fast (sequential)	Slow (random page splits)	Fast (sequential)
Index size	Small	Larger	Larger
Distributed-safe	No	Yes	Yes
Reveals counts	Yes	No	No

For new systems, UUID v7 (time-ordered) gives you distributed IDs without the random-insertion penalty.

-- Postgres extension for UUID v7 (Postgres 18+ has built-in)
CREATE EXTENSION pg_uuidv7;

CREATE TABLE orders (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v7(),
  ...
);

Q26. Composite primary key vs surrogate key?

Answer:

Use a composite PK when the natural key is short, immutable, and queries always use it. The classic example is a join table:

CREATE TABLE user_role (
  user_id BIGINT,
  role_id BIGINT,
  PRIMARY KEY (user_id, role_id)
);

Use a surrogate (auto-increment or UUID) when the natural key can change, when foreign keys would otherwise be wide, or when you need a single column for references.

Write Optimization

Q27. Why is one-row-at-a-time INSERT slow at scale?

Answer:

Each statement is a network round trip plus a separate transaction commit (with autocommit on). 10,000 inserts at 2 ms each is 20 seconds of pure latency.

Bad:

foreach ($events as $event) {
    DB::table('events')->insert($event);   // 10,000 round trips
}

Good - batched insert:

DB::table('events')->insert($events);   // 1 round trip

Even better, single transaction:

BEGIN;
INSERT INTO events (a, b) VALUES (1,2),(3,4),(5,6),...;
COMMIT;

Best for huge loads, bypass the SQL parser:

-- Postgres: 10-100x faster than INSERT
COPY events FROM '/tmp/events.csv' WITH (FORMAT CSV);

-- MySQL
LOAD DATA INFILE '/tmp/events.csv' INTO TABLE events;

Q28. How do you bulk delete from a huge table safely?

Answer:

A single DELETE FROM events WHERE created_at < ... on 50 million rows holds long locks, bloats WAL, and may block replication.

Chunk the delete:

DO $$
DECLARE deleted INT;
BEGIN
  LOOP
    DELETE FROM events
    WHERE id IN (
      SELECT id FROM events
      WHERE created_at < NOW() - INTERVAL '1 year'
      LIMIT 5000
    );
    GET DIAGNOSTICS deleted = ROW_COUNT;
    EXIT WHEN deleted = 0;
    PERFORM pg_sleep(0.1);   -- breathing room
  END LOOP;
END $$;

For partitioned tables, dropping a partition is instant:

DROP TABLE events_2023_01;   -- removes 50M rows in milliseconds

Q29. How does UPSERT perform and what are the pitfalls?

Answer:

-- Postgres
INSERT INTO counters (id, count) VALUES (1, 1)
ON CONFLICT (id) DO UPDATE
SET count = counters.count + 1;

-- MySQL
INSERT INTO counters (id, count) VALUES (1, 1)
ON DUPLICATE KEY UPDATE count = count + 1;

Pitfalls:

Postgres creates a new row version on every UPDATE, which causes bloat. Vacuum frequently or use fillfactor=80.
A hot key (e.g., a single counter incremented thousands of times per second) becomes a serialization bottleneck. Use sharded counters.

-- Sharded counter pattern
CREATE TABLE counters (
  bucket SMALLINT,
  count BIGINT,
  PRIMARY KEY (bucket)
);
-- 16 buckets; each writer picks a random bucket

Read Optimization

Q30. When and how do you use read replicas?

Answer:

Use a read replica when read traffic exceeds the primary's capacity, or when you want analytics off the OLTP path.

Application pattern:

// Laravel
'mysql' => [
    'read' => ['host' => ['replica1', 'replica2']],
    'write' => ['host' => ['primary']],
    ...
]

Pitfalls:

Replication lag means replicas can be seconds behind. After a write, a read may show stale data.
Stick the user's session to the primary for a few seconds after their write to avoid stale reads.
Long-running queries on the replica can pause replication.

Q31. Where can queries be cached?

Answer:

Layer	Hit speed	Invalidation
Postgres `shared_buffers`	< 1 ms	Automatic LRU
InnoDB buffer pool	< 1 ms	Automatic LRU
App in-process LRU	< 0.1 ms	Manual / TTL
Redis / Memcached	1-2 ms	Manual / TTL
CDN	edge RTT	Headers / purge

The DB caches hot data automatically. The question is whether you can skip the DB entirely with a separate cache layer.

Caching Strategies

Q32. What is the cache-aside pattern?

Answer:

The most common cache pattern. The application reads from cache; on miss, reads from DB and populates the cache.

def get_user(user_id):
    val = redis.get(f"user:{user_id}")
    if val:
        return json.loads(val)
    user = db.query("SELECT * FROM users WHERE id=%s", user_id)
    redis.setex(f"user:{user_id}", 300, json.dumps(user))
    return user

def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id=%s", user_id)
    redis.delete(f"user:{user_id}")  # invalidate

Simple, but writes can race: between the DB update and the cache delete, another reader may repopulate stale data.

Q33. What is the difference between read-through, write-through, and write-back?

Answer:

Pattern	Read path	Write path	Consistency
Cache-aside	App handles miss	App invalidates	Strong (with care)
Read-through	Cache fetches from DB	App writes to DB	Strong
Write-through	Cache fetches from DB	App writes both	Strong
Write-back	Cache fetches	Cache writes async to DB	Eventual

Write-back is fastest but dangerous: a cache crash loses unflushed writes. Use only for non-critical data with replication.

Q34. How do you invalidate the cache?

Answer:

There are several strategies, in increasing complexity:

// 1. TTL only — eventual consistency
Cache::put("user:$id", $user, 300);

// 2. Explicit delete on write
DB::transaction(function() use ($user) {
    $user->save();
    Cache::forget("user:{$user->id}");
});

// 3. Versioned keys — bump version on write
Cache::put("user:$id:v" . $user->version, $user);

// 4. Tag-based purge
Cache::tags(["users", "user:$id"])->put($key, $val);
Cache::tags(["user:$id"])->flush();

Common bug: invalidating before the DB commit, so a reader caches stale data. Always invalidate after commit.

Q35. How do you prevent a cache stampede?

Answer:

When a popular key expires, thousands of concurrent requests all hit the DB at once. Solutions:

# Single-flight with a lock
def get_with_lock(key, ttl=300):
    val = redis.get(key)
    if val:
        return val
    if redis.set(f"lock:{key}", "1", nx=True, ex=10):
        try:
            val = db.query(...)
            redis.setex(key, ttl, val)
            return val
        finally:
            redis.delete(f"lock:{key}")
    else:
        time.sleep(0.1)
        return redis.get(key) or db.query(...)

Other patterns:

Probabilistic early refresh: refresh slightly before expiry with rising probability
Two-tier TTL: soft expiry triggers async refresh, hard expiry forces sync
Stale-while-revalidate: serve stale, refresh in background

Q36. How do you use a CDN as a query cache?

Answer:

For low-cardinality public read endpoints, set the HTTP cache headers and let the CDN serve almost all traffic:

return response()->json($products)
    ->header('Cache-Control', 'public, max-age=60, stale-while-revalidate=300')
    ->header('Surrogate-Key', 'products');

The CDN serves 90 percent or more of requests with no DB hit. Use the Surrogate-Key header to purge by tag when products change. Do not cache personalized content at the CDN unless you partition by user.

Connection Pooling

Q37. Why is connection pooling critical?

Answer:

Each Postgres backend uses about 10 MB of RAM. 1000 raw connections is 10 GB just for connection overhead, plus context switching kills performance.

Pool sizing rule of thumb:

pool_size = (core_count * 2) + effective_spindle_count

For an 8-core SSD server, around 20 connections per app instance. Counterintuitively, more connections often means worse throughput because of contention.

# pgbouncer.ini
[databases]
mydb = host=db port=5432

[pgbouncer]
pool_mode = transaction
default_pool_size = 20
max_client_conn = 1000

Q38. PgBouncer pool modes — what are the trade-offs?

Answer:

Mode	Description	Limits
Session	Backend held for client's entire session	Same overhead as direct connections
Transaction	Backend held per transaction (default)	No session-level features (prepared statements pre-PG14, advisory locks across transactions)
Statement	Backend per statement	No multi-statement transactions

Production default: transaction mode. Pool size matched to actual concurrent transactions, not concurrent users.

Q39. What are the symptoms of pool exhaustion?

Answer:

Latency spikes correlated with traffic spikes
App logs show "timeout waiting for connection"
DB CPU is low but the app is queueing
A long-running transaction from a background job starves everyone

Detect:

-- Postgres: connections by state
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

-- Long-running queries
SELECT pid, now() - query_start AS dur, state, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY dur DESC;

Kill the rogue:

SELECT pg_terminate_backend(<pid>);

Transactions and Concurrency

Q40. How do transactions affect performance?

Answer:

Long transactions hold row and page locks, blocking other writers. In Postgres, they prevent vacuum from cleaning up dead tuples. In MySQL InnoDB they keep undo log entries around.

Bad pattern - HTTP call inside a transaction:

DB::transaction(function () use ($order) {
    $order->save();
    Http::post('https://payment.example.com/charge', $payload);  // 2s
    $order->update(['status' => 'paid']);
});  // transaction held for 2+ seconds

Good - move I/O outside:

DB::transaction(fn() => $order->save());
$response = Http::post('https://payment.example.com/charge', $payload);
DB::transaction(fn() => $order->update(['status' => 'paid']));

Q41. What are the isolation levels and how do they affect performance?

Answer:

Level	Read consistency	Write conflicts	Performance
Read Uncommitted	Sees uncommitted writes	Lots	Fastest, dirty reads
Read Committed	Per-statement snapshot	Some	Default Postgres
Repeatable Read	Per-transaction snapshot	Some	Default MySQL
Serializable	Full serial order	Highest abort rate	Slowest

In Postgres, Serializable uses SSI and aborts conflicting transactions. In MySQL, Serializable adds gap locks, hurting concurrency further.

Q42. When do you use SELECT FOR UPDATE?

Answer:

For pessimistic locking on mutually exclusive operations (transfers, inventory decrements):

BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;  -- locks row
UPDATE accounts SET balance = balance - 50 WHERE id = 1;
COMMIT;

For high-contention scenarios, prefer optimistic locking:

UPDATE accounts
SET balance = balance - 50, version = version + 1
WHERE id = 1 AND version = 7;
-- If 0 rows updated, retry

Partitioning and Sharding

Q43. What is the difference between partitioning and sharding?

Answer:

Aspect	Partitioning	Sharding
Where	One DB instance	Multiple DB instances
Transparency	DB sees one logical table	App or proxy routes
Use case	Large tables, hot/cold data	Beyond single-server capacity

Partitioning helps a single server. Sharding helps when one server is not enough.

Q44. How do you partition tables in Postgres?

Answer:

Three strategies: range, list, and hash.

-- Range partitioning by time (most common)
CREATE TABLE events (
  id BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  payload JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2025_01 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
  FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- Hash partitioning
CREATE TABLE users (...) PARTITION BY HASH (user_id);
CREATE TABLE users_p0 PARTITION OF users
  FOR VALUES WITH (modulus 16, remainder 0);

Benefits:

Per-partition indexes (smaller, faster)
Drop old partitions instantly instead of deleting 100M rows
Partition pruning skips irrelevant partitions

Use pg_partman to auto-create future partitions.

Q45. How do you choose a shard key?

Answer:

A bad shard key destroys you. Properties of a good shard key:

High cardinality
Even distribution (no hot shards)
Aligned with most queries (so single-shard reads are common)
Stable (does not change for a row)

App type	Good shard key	Why
User-data app	`user_id`	Most queries are per-user
Multi-tenant SaaS	`tenant_id`	Watch for one giant tenant
Time-series	composite (entity, time)	Pure time creates a hot shard

Anti-patterns: sequential ID (last shard is always hot), country code (US dominates), status (only a few values).

Q46. What is consistent hashing and why does it matter?

Answer:

With naive hash(key) % N sharding, adding a shard remaps almost every key. With consistent hashing, only 1/N keys move.

Ring (0..2^32):
  Shard A: positions 100, 5000, 8000, ...
  Shard B: positions 800, 3000, 9000, ...
  Key K   hashes to 4500 -> belongs to Shard A's 5000 position

Used by: Redis Cluster, Cassandra, DynamoDB, Vitess. Pair with virtual nodes (each physical shard owns many ring positions) for better balance.

Q47. How do you handle cross-shard queries?

Answer:

Options in order of preference:

Avoid them: design schema so 99 percent of queries hit one shard.
Scatter-gather: query all shards and merge results. Latency equals the slowest shard.
Materialized index: denormalize into a separate "global" lookup table.
Search engine alongside: push searchable data to Elasticsearch.
Analytics pipeline: ETL all shards into a warehouse like Snowflake or ClickHouse.

# Scatter-gather example
async def get_orders_for_user(user_id):
    shards = [shard1, shard2, shard3, shard4]
    results = await asyncio.gather(*[
        s.query("SELECT * FROM orders WHERE user_id = %s", user_id)
        for s in shards
    ])
    return [row for shard in results for row in shard]

100M+ Row Tables

Q48. What changes when a table crosses 50 million rows?

Answer:

Full scans become unacceptable (minutes to hours)
Index size starts pressuring buffer pool
ALTER TABLE rewrites become a major operation (hours)
Backup time grows linearly
Single-row latency stays fast IF indexes are right

Operational changes:

Use pg_repack or pt-online-schema-change for online migrations
Add columns with default only on engines that support instant default (Postgres 11+, MySQL 8 InnoDB)
Backfill in chunks
Plan partitioning before crossing 100M

Q49. Soft delete vs hard delete at scale.

Answer:

Soft delete (deleted_at column) is easy to recover from but every query must filter, and the index carries tombstones.

-- Bad: every query filters, full index includes deleted rows
SELECT * FROM users WHERE deleted_at IS NULL AND email = ?;
CREATE INDEX idx_users_email ON users(email);

-- Good: partial index excludes deleted rows
CREATE INDEX idx_users_email ON users(email)
WHERE deleted_at IS NULL;

Production hybrid:

Soft delete with a 30-day retention
Background job hard-deletes after retention
Move expired data to an archive table or cold storage

Q50. What is a good archiving strategy for old data?

Answer:

hot table (last 90d) -> warm table -> cold archive (S3) -> drop partition

Time-based partitioning makes archiving trivial:

-- Export a partition, then drop it
COPY events_2023_01 TO '/archive/events_2023_01.csv';
DROP TABLE events_2023_01;

For analytics, replicate to a columnar warehouse like ClickHouse, Redshift, or BigQuery.

Q51. How do you do schema migrations on huge tables?

Answer:

Don't:

-- Full rewrite, hours of locks on large tables
ALTER TABLE huge ADD COLUMN x INT NOT NULL DEFAULT 0;

Do (Postgres 11+):

-- 1. Instant: add nullable column
ALTER TABLE huge ADD COLUMN x INT;

-- 2. Backfill in chunks
DO $$
DECLARE last_id BIGINT := 0;
BEGIN
  LOOP
    UPDATE huge SET x = 0
    WHERE id > last_id AND id <= last_id + 10000;
    EXIT WHEN NOT FOUND;
    last_id := last_id + 10000;
    PERFORM pg_sleep(0.05);
  END LOOP;
END $$;

-- 3. Add NOT NULL (instant in PG12+)
ALTER TABLE huge ALTER COLUMN x SET NOT NULL;

For complex changes, use pg-osc, gh-ost (MySQL), or pt-online-schema-change.

Replication

Q52. What are the Postgres replication modes?

Answer:

Mode	Latency	Durability	Use
Async streaming	Low	Loss possible on primary failure	Default
Synchronous	Higher	Two-node durability	Financial / critical
Logical	Per-table	Selective	Heterogeneous, zero-downtime upgrades

Synchronous replication waits for at least one replica to acknowledge before commit returns:

# postgresql.conf on primary
synchronous_commit = on
synchronous_standby_names = 'replica1,replica2'

This adds 1-2 ms per commit but guarantees no data loss on primary failure.

Q53. What are the MySQL replication modes?

Answer:

Statement-based: small log, but non-deterministic statements like NOW() or UUID() misbehave.
Row-based: larger log, exact, default since 5.7.
Mixed: statement when safe, row otherwise.

For replicas falling behind, parallelize the apply:

# my.cnf on replica
slave_parallel_workers = 8
slave_parallel_type = LOGICAL_CLOCK

Q54. What causes replication lag and how do you monitor it?

Answer:

Causes:

Long transactions on primary (replica must replay each one)
Slow queries on replica blocking apply
Network bottleneck
Single-thread apply on older MySQL

Monitor:

-- Postgres on replica
SELECT now() - pg_last_xact_replay_timestamp() AS lag;

-- MySQL
SHOW SLAVE STATUS \G
-- Watch Seconds_Behind_Master

Alert if lag exceeds 5 seconds for OLTP traffic.

Server Tuning

Q55. What are the key Postgres tuning parameters?

Answer:

Parameter	Description	Typical
`shared_buffers`	Postgres internal cache	25% of RAM
`effective_cache_size`	Hint of total cache (OS+PG)	50-75% of RAM
`work_mem`	Per-operation sort/hash memory	4-64 MB
`maintenance_work_mem`	For VACUUM, CREATE INDEX	256 MB - 2 GB
`wal_buffers`	WAL buffer	16 MB
`random_page_cost`	SSD = 1.1, spinning = 4	1.1 (SSD)
`effective_io_concurrency`	Async I/O depth	200 (SSD)

Example for a 32 GB server:

shared_buffers = 8GB
effective_cache_size = 24GB
work_mem = 32MB
maintenance_work_mem = 2GB
random_page_cost = 1.1
effective_io_concurrency = 200

Q56. How do you tune autovacuum on high-write tables?

Answer:

Default autovacuum is too lazy for big, high-write tables. Bloat grows.

Global tuning:

autovacuum_vacuum_scale_factor = 0.05    # default 0.2
autovacuum_analyze_scale_factor = 0.02
autovacuum_vacuum_cost_limit = 2000      # default 200
autovacuum_naptime = 30s

Per-table:

ALTER TABLE big_table SET (
  autovacuum_vacuum_scale_factor = 0.01,
  fillfactor = 90
);

Symptoms of bad autovacuum:

Bloat growing over time
Wraparound warnings (age(datfrozenxid) near 200M)
Query plans degrading

Q57. What are the key MySQL InnoDB tuning parameters?

Answer:

Parameter	Description	Typical
`innodb_buffer_pool_size`	Cache for data and index	50-75% of RAM
`innodb_log_file_size`	Redo log size	1-4 GB
`innodb_flush_log_at_trx_commit`	1 = full ACID; 2 = slight risk, faster	1
`innodb_flush_method`	I/O method	O_DIRECT
`innodb_io_capacity`	Disk IOPS hint	2000+ for SSD
`innodb_buffer_pool_instances`	For very large pools	8

Example for a 32 GB server:

innodb_buffer_pool_size = 24G
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_io_capacity = 4000

Q58. How do you size `work_mem` in Postgres?

Answer:

work_mem is per-sort-or-hash, per-backend. A query with three sorts uses three times work_mem. Total possible memory is work_mem × max_connections × ops_per_query.

For 1 GB headroom, 100 connections, 3 ops per query: work_mem = 3 MB.

Better: keep work_mem modest (8-16 MB) globally and override for analytical sessions:

-- Per-session boost for a heavy report
SET work_mem = '512MB';
SELECT ... GROUP BY ... ORDER BY ...;

Q59. How does parallelism work in Postgres?

Answer:

Postgres supports parallel sequential scan, parallel hash join, and parallel aggregate.

max_worker_processes = 16
max_parallel_workers = 8
max_parallel_workers_per_gather = 4
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1
min_parallel_table_scan_size = 8MB

Parallel only triggers above the table size threshold. Small tables stay serial. Inspect with EXPLAIN:

Gather  (cost=... rows=...)
  Workers Planned: 4
  ->  Parallel Seq Scan on big_table

Monitoring and Slow Queries

Q60. What should the slow query log capture?

Answer:

Postgres:

log_min_duration_statement = 200    # log queries > 200 ms
log_temp_files = 0                  # any spill to disk
log_lock_waits = on
log_checkpoints = on

# Auto-explain logs the plan automatically
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 500
auto_explain.log_analyze = on

MySQL:

slow_query_log = ON
long_query_time = 0.2
log_queries_not_using_indexes = ON

Aggregate with pg_stat_statements (Postgres) or performance_schema (MySQL).

Q61. How do you find your top slow queries with `pg_stat_statements`?

Answer:

SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  rows,
  100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_pct
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Focus on total_exec_time. A 1 ms query called 10 million times causes more pain than a 1-second query called 100 times.

Q62. How do you find lock contention?

Answer:

-- Postgres: who is blocking whom
SELECT
  blocked.pid AS blocked_pid,
  blocked.usename AS blocked_user,
  blocking.pid AS blocking_pid,
  blocking.usename AS blocking_user,
  blocked.query AS blocked_query,
  blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));

Red flags:

Long idle in transaction sessions — investigate the app code
AccessExclusiveLock on a popular table — DDL during traffic

OLTP vs OLAP

Q63. Why does mixing OLTP and OLAP on the same DB hurt?

Answer:

OLTP is many small, fast transactions. OLAP is few huge analytical queries. If they share a buffer pool:

OLAP loads 100 GB into the buffer, evicting OLTP hot data, causing user-facing latency spikes
OLAP holds long transactions, preventing vacuum, causing bloat
OLAP locks block OLTP

Solutions:

Dedicated read replica for analytics
ETL to a warehouse (Snowflake, BigQuery, ClickHouse, DuckDB)
Materialized views refreshed periodically

Q64. When and how do you use materialized views?

Answer:

Use for expensive aggregations queried many times.

CREATE MATERIALIZED VIEW daily_revenue AS
SELECT date_trunc('day', created_at) AS day,
       SUM(total) AS revenue
FROM orders
GROUP BY 1;

CREATE UNIQUE INDEX ON daily_revenue (day);

-- Refresh without locking readers
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;

For continuous freshness, use incremental refresh tools like pg_ivm, or maintain a rollup table via triggers.

Maintenance

Q65. How does backup affect a live primary?

Answer:

pg_dump runs in a long transaction, blocking vacuum and causing bloat. For large DBs:

Run pg_basebackup from a replica, not the primary
Or pg_dump --jobs=N --format=directory for parallelism
Or filesystem-level snapshots (ZFS, EBS) — fastest, but need pg_start_backup/pg_stop_backup

For MySQL, use xtrabackup for online physical backups without locks.

Q66. How do you rebuild an index without downtime?

Answer:

-- Postgres 12+
REINDEX INDEX CONCURRENTLY idx_orders_user;

-- Or pg_repack from the command line
pg_repack -t orders -j 4 mydb

Anti-pattern: dropping and recreating an index during traffic. Queries that used the old index will degrade until the rebuild finishes.

Production Issues

Q67. "Suddenly all queries are slow" — what do you check?

Answer:

Checklist in order:

Recent deploy? Roll back or check new code paths
CPU or I/O saturation? Check iostat, top
Connection pool exhausted? Check pg_stat_activity
Long-running query holding locks? Kill it
Vacuum or autovacuum running? Wait or raise resources
Disk full? Check df -h
Replica promoted unexpectedly? Check current primary

-- Quick health check
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
SELECT pg_size_pretty(pg_database_size(current_database()));

Q68. "Some queries are fast, then suddenly slow" — what's happening?

Answer:

This is usually a plan flip. The planner picked a different plan because:

Statistics changed (auto-analyze ran after a bulk write)
Prepared statement parameter sniffing (Postgres uses generic plan after 5 executions)
Different data shape — querying a user with 1M rows uses a different plan than one with 10

Fix:

-- Force custom plans every time
SET plan_cache_mode = 'force_custom_plan';

-- Refresh stats
ANALYZE big_table;

-- Add an index that's optimal for both shapes
CREATE INDEX idx_x_y ON t (x, y);

Q69. "DB looks healthy but the app is slow" — where do you look?

Answer:

The DB is fast but the app sees latency:

App-side connection pool too small — request queueing in the app
Network latency between app and DB (cross-AZ, VPN)
TLS handshake on every connection — use connection pooling and persistent connections
DNS lookup on every connection — cache or use IP
ORM is doing N+1 even though each query is fast
Slow JSON serialization on the app side

// Bad: opens a new connection per request
foreach ($items as $item) {
    DB::connection('reports')->select(...);  // reconnects often
}

Q70. How do you handle disk space exhaustion in Postgres?

Answer:

Symptoms: WAL fills disk, replication breaks, primary refuses writes.

Common causes:

WAL retention too high or replication slot held by a dead replica
Disabled autovacuum so bloat grows forever
Long-running transaction prevents WAL recycle
Logical replication slot inactive, accumulating WAL

Recovery:

-- Find inactive slots
SELECT slot_name, active, restart_lsn
FROM pg_replication_slots WHERE NOT active;

-- Drop a dead slot (be careful)
SELECT pg_drop_replication_slot('dead_replica');

-- Find long transactions
SELECT pid, now() - xact_start AS dur, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY dur DESC;

Q71. How do you handle hot row contention?

Answer:

Counter-style updates on one row create a serialization point.

Bad:

-- 10k req/s all hitting the same row -> lock queue
UPDATE counters SET count = count + 1 WHERE id = 1;

Sharded counter:

CREATE TABLE counters (
  bucket SMALLINT,
  count BIGINT,
  PRIMARY KEY (bucket)
);

-- Writer picks a random bucket
UPDATE counters SET count = count + 1
WHERE bucket = floor(random() * 16);

-- Reader sums all buckets
SELECT SUM(count) FROM counters;

Or use Redis for the increment and persist asynchronously:

$total = Redis::incr("post:$id:views");
if ($total % 100 === 0) {
    Posts::where('id', $id)->update(['views' => $total]);
}

Q72. When do you use JSONB and what are the pitfalls?

Answer:

Use JSONB when the schema is genuinely flexible (custom user fields, extensible event payload), or when some query reads inside the document.

CREATE TABLE events (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL,
  event_type TEXT NOT NULL,
  payload JSONB
);

-- GIN index for key existence and contains queries
CREATE INDEX idx_events_payload ON events USING GIN (payload);

-- Query
SELECT * FROM events WHERE payload @> '{"button": "checkout"}';

Pitfalls:

JSONB key extraction is slower than dedicated columns
GIN indexes are heavier than B-tree
TOAST'd JSONB causes extra reads
ORM eager-loading JSON fields can blow up bandwidth

Best practice: extract frequently queried keys to dedicated columns, keep tail data in JSONB.

Q73. Does NULL hurt performance?

Answer:

NULL itself is cheap to store. But:

WHERE col IS NULL may not use an index unless it's a partial index
NOT IN (subquery with NULL) returns no rows, a common bug
NULL semantics in joins surprise people

-- Common bug: returns no rows if any user_id is NULL
SELECT * FROM products
WHERE user_id NOT IN (SELECT user_id FROM blocked_users);

-- Correct
SELECT * FROM products p
WHERE NOT EXISTS (
  SELECT 1 FROM blocked_users b WHERE b.user_id = p.user_id
);

If a column is semantically never null, declare NOT NULL for better stats and plans.

Q74. What does this query do? (Common interview trap)

Answer:

SELECT u.id FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;

It returns users with no orders. The LEFT JOIN plus IS NULL is the classic anti-join idiom. Equivalent:

SELECT id FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

Postgres usually generates the same plan for both. MySQL historically prefers NOT EXISTS.

Q75. How would you scale a database to 10x traffic?

Answer:

Walk through the steps in order, from cheapest to most complex:

Measure current bottleneck (CPU, I/O, connections, latency by query)
Optimize the top 5 slow queries (often 80% of load)
Cache hot reads (Redis, application cache)
Add read replicas for read scaling
Add a connection pooler (PgBouncer, ProxySQL)
Vertical scale (bigger box)
Async non-critical writes (queues)
Partition huge tables
Shard only when one box cannot keep up
Move analytics off to a warehouse

Each step buys time before the next, in increasing operational complexity.

Appendix — Index Decision Cheat Sheet

Query pattern	Recommended index
`WHERE a = ?`	`(a)`
`WHERE a = ? AND b = ?`	`(a, b)`
`WHERE a = ? ORDER BY b`	`(a, b)`
`WHERE a = ? AND b > ? ORDER BY b`	`(a, b)`
`WHERE a IN (...)`	`(a)`
`LIKE 'abc%'`	B-tree `(col text_pattern_ops)` in PG
`LIKE '%abc%'`	trigram (`pg_trgm`) GIN
JSONB key	GIN on `(jsonb_col)`
Geospatial	GiST
Full-text	GIN on `tsvector`

Appendix — Latency Reference Numbers

Operation	Time
L1 cache	0.5 ns
L2 cache	7 ns
Main memory	100 ns
SSD random read	100 µs
Same-AZ network round-trip	0.5 ms
HDD random read	10 ms
Cross-region round-trip	100+ ms
HTTP API call	50-500 ms

If your DB is on SSD in the same AZ, even a "cold" indexed lookup is under 1 ms. If a query takes 100 ms, something is doing many disk reads, scanning large data, or contending for a lock.

FilesExpand file tree

database-optimization.md

Latest commit

History

database-optimization.md

File metadata and controls

Database Optimization — Interview Q&A Guide

Query Optimization Fundamentals

Q1. What is the first rule of database optimization?

Q2. What are typical latency budgets for a single query?

Q3. What are the most common reasons a query is slow?

Q4. What is the difference between a logical query plan and a physical plan?

Q5. Why does SELECT * hurt performance?

Indexes

Q6. What does a B-tree index actually do?

Q7. When is an index NOT used by the planner?

Q8. What is a covering index?

Q9. What is a partial index?

Q10. How do you choose composite index column order?

Q11. What is index bloat and how do you fix it?

Q12. Hash index vs B-tree?

Q13. GIN vs GiST in Postgres?

Q14. Why is having too many indexes bad?

EXPLAIN and Query Plans

Q15. How do you read a Postgres EXPLAIN ANALYZE output?

Q16. What are the common Postgres plan node types?

Q17. How do you read MySQL EXPLAIN output?

N+1 and JOIN Strategies

Q18. What is the N+1 problem and how do you detect it?

Q19. When should you use Nested Loop vs Hash Join vs Merge Join?

Q20. When should you denormalize a join?

Q21. EXISTS vs IN vs JOIN — when to use which?

Q22. How do you optimize an OR query?

Schema Optimization

Q23. How do data types affect performance?

Q24. What is the difference between wide rows and narrow rows?

Q25. UUID vs auto-increment IDs?

Q26. Composite primary key vs surrogate key?

Write Optimization

Q27. Why is one-row-at-a-time INSERT slow at scale?

Q28. How do you bulk delete from a huge table safely?

Q29. How does UPSERT perform and what are the pitfalls?

Read Optimization

Q30. When and how do you use read replicas?

Q31. Where can queries be cached?

Caching Strategies

Q32. What is the cache-aside pattern?

Q33. What is the difference between read-through, write-through, and write-back?

Q34. How do you invalidate the cache?

Q35. How do you prevent a cache stampede?

Q36. How do you use a CDN as a query cache?

Connection Pooling

Q37. Why is connection pooling critical?

Q38. PgBouncer pool modes — what are the trade-offs?

Q39. What are the symptoms of pool exhaustion?

Transactions and Concurrency

Q40. How do transactions affect performance?

Q41. What are the isolation levels and how do they affect performance?

Q42. When do you use SELECT FOR UPDATE?

Partitioning and Sharding

Q43. What is the difference between partitioning and sharding?

Q44. How do you partition tables in Postgres?

Q45. How do you choose a shard key?

Q46. What is consistent hashing and why does it matter?

Q47. How do you handle cross-shard queries?

100M+ Row Tables

Q48. What changes when a table crosses 50 million rows?

Q49. Soft delete vs hard delete at scale.

Q50. What is a good archiving strategy for old data?

Q51. How do you do schema migrations on huge tables?

Replication

Q52. What are the Postgres replication modes?

Q53. What are the MySQL replication modes?

Q54. What causes replication lag and how do you monitor it?

Server Tuning

Q55. What are the key Postgres tuning parameters?

Q56. How do you tune autovacuum on high-write tables?

Q57. What are the key MySQL InnoDB tuning parameters?

Q58. How do you size work_mem in Postgres?

Q59. How does parallelism work in Postgres?

Q5. Why does `SELECT *` hurt performance?

Q15. How do you read a Postgres `EXPLAIN ANALYZE` output?

Q58. How do you size `work_mem` in Postgres?

Q61. How do you find your top slow queries with `pg_stat_statements`?