This guide covers Node.js runtime fundamentals, the event loop, V8 optimization, memory and CPU profiling, HTTP server tuning, framework comparison, clustering and worker threads, database pooling, caching, async patterns, streams, logging, queues, and microservice latency.
Answer:
Node.js is a V8 JavaScript engine plus libuv (event loop, async I/O, thread pool) plus a standard library of modules. It runs JavaScript on a single thread and offloads I/O to multiple threads inside libuv.
Typical request lifecycle:
- HTTP server receives bytes (TCP, libuv)
- Parser builds Request and Response objects
- Middleware chain executes JS
- Async I/O (DB, HTTP, file) yields back to libuv
- Response sent
The single thread runs all your JavaScript. Anything CPU-heavy on this thread blocks every concurrent request.
Answer:
| Endpoint type | p50 | p95 | p99 |
|---|---|---|---|
| Cached lookup | < 5 ms | < 20 ms | < 50 ms |
| Single DB query | < 20 ms | < 100 ms | < 250 ms |
| Multi-DB or external API | < 100 ms | < 500 ms | < 1 s |
| CPU-light compute | < 10 ms | < 50 ms | < 100 ms |
If p99 is wildly higher than p50, you have a long-tail problem (GC pauses, event loop blocking, slow third party).
Answer:
- Synchronous code on the event loop (
fs.readFileSync, largeJSON.parse) - Memory leaks from unbounded caches or listeners
- Missing
await(unhandled promise rejection or fire-and-forget) - Sequential
awaitwhere parallel would work - Connection pool too small or leaking
- Logging at info or debug in production with synchronous writes
- CPU-bound regex (ReDoS)
- Unbounded payload size
- No timeouts on outgoing HTTP
console.login hot paths
Answer:
timers <- setTimeout, setInterval callbacks
pending callbacks <- some I/O callbacks deferred
idle, prepare (internal)
poll <- incoming data
check <- setImmediate callbacks
close callbacks <- socket.on('close', ...)
Microtasks (Promises, queueMicrotask) and process.nextTick run between every phase, draining their queues each time.
Answer:
A long-running synchronous task blocks every phase. If 100 concurrent users hit an endpoint that runs a 200 ms loop, you have added up to 200 ms of latency to every other in-flight request.
Detect using built-in perf_hooks:
const { monitorEventLoopDelay } = require('node:perf_hooks');
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();
setInterval(() => {
console.log('p99 lag:', h.percentile(99) / 1e6, 'ms');
h.reset();
}, 5000);A healthy loop has p99 lag below 50 ms.
Answer:
process.nextTickruns before the next phase, before any other I/O. Easy to starve the loop.queueMicrotaskis the standardized version ofnextTick.setImmediateruns in the check phase, after I/O callbacks.setTimeout(fn, 0)runs in the timers phase, with a minimum of 1 ms in practice.
// Break up CPU work without starving I/O
function processChunked(items, i = 0) {
const start = Date.now();
while (i < items.length && Date.now() - start < 10) {
processItem(items[i++]);
}
if (i < items.length) {
setImmediate(() => processChunked(items, i));
}
}Rule: use setImmediate to yield to I/O.
Answer:
libuv has a thread pool (default 4 threads) for:
- File system operations
- DNS lookups (
dns.lookup, notdns.resolve) crypto.pbkdf2,crypto.scrypt, bcrypt- Some zlib operations
Tune via environment variable:
UV_THREADPOOL_SIZE=16 node app.jsIf your app does many concurrent file, crypto, or DNS operations, the default 4 is a bottleneck. The maximum is 1024.
Q8. What are V8 hidden classes and inline caching?
Answer:
V8 builds hidden classes based on object shape. Mutating shape (adding properties later) invalidates optimization.
// SLOW: shape changes after construction
function User(name) { this.name = name; }
const u = new User('a');
u.email = 'b'; // hidden class changes
// FAST: all properties initialized in constructor
function User(name, email) {
this.name = name;
this.email = email;
}Always initialize all properties in the constructor in the same order.
Answer:
A function that is called with a single object shape stays optimized:
function getName(u) { return u.name; }
// Monomorphic — same shape
getName({ name: 'a', email: 'b' });
getName({ name: 'c', email: 'd' });
// Polymorphic — second shape introduced
getName({ name: 'a', age: 1 });
// Megamorphic — V8 gives up on inline caching
getName({ name: 'a', height: 2, etc: 3 });For hot paths, prefer consistent shapes.
Answer:
- Mixing types (
x = 1thenx = 'string') deleteon object properties (changes hidden class)argumentsobject misuse — use rest parameters insteadtry/catchwas a deopt killer in old V8; OK in modern V8 but avoid in hot pathsFunction.prototype.applyorcallwith non-array arguments
For most apps, do not micro-optimize V8. Profile first.
Answer:
| Region | Description |
|---|---|
| New Space (Young Gen) | Short-lived objects, scavenged frequently |
| Old Space (Old Gen) | Long-lived, mark-sweep-compact |
| Large Object Space | Objects over ~1 MB |
| Code Space | Compiled code |
| Map Space | Hidden classes |
--max-old-space-size=4096 raises the old-space limit (default ~1.7 GB on 64-bit). Beyond ~8 GB, GC pauses get long; consider sharding into multiple processes.
Answer:
- Caches without TTL or size limit
- Event listeners not removed
- Closures holding large data
- Timers not cleared
- Streams not consumed
- Global state accumulating
// BAD: unbounded cache, classic leak
const cache = {};
function getUser(id) {
if (!cache[id]) cache[id] = fetchUser(id);
return cache[id];
}// GOOD: bounded LRU with TTL
const { LRUCache } = require('lru-cache');
const cache = new LRUCache({ max: 10000, ttl: 5 * 60 * 1000 });
function getUser(id) {
let user = cache.get(id);
if (!user) {
user = fetchUser(id);
cache.set(id, user);
}
return user;
}Answer:
node --inspect server.jsOpen chrome://inspect, take a heap snapshot, do some work, take another, then compare.
Look for large arrays, growing Maps, or closures retaining HTTP requests.
Programmatic capture:
const v8 = require('node:v8');
process.on('SIGUSR2', () => {
const path = `./snap-${Date.now()}.heapsnapshot`;
v8.writeHeapSnapshot(path);
console.log('Wrote', path);
});Trigger via kill -USR2 <pid>.
Answer:
npm i -g clinic autocannon
clinic doctor -- node server.js
clinic flame -- node server.js
clinic bubbleprof -- node server.js
clinic heapprofiler -- node server.jsdoctor: classifies the issue (CPU, memory, event loop, I/O)flame: flame graph of CPUbubbleprof: async flow visualizationheapprofiler: allocation profiling
Combine with autocannon to drive load.
Answer:
npm i -g 0x
0x server.js
# Stop with Ctrl+C and a flame graph HTML opens in your browserLighter than clinic, focused on CPU time.
Answer:
const cache = new WeakMap();
function getMeta(obj) {
let meta = cache.get(obj);
if (!meta) {
meta = expensive(obj);
cache.set(obj, meta);
}
return meta;
}WeakMap entries are garbage-collected when the key has no other references. Useful when the key is an object whose lifetime you control.
WeakRef (newer) lets you hold references that do not prevent GC, with explicit .deref().
Answer:
const { LRUCache } = require('lru-cache');
const cache = new LRUCache({
max: 1000,
ttl: 60_000,
updateAgeOnGet: false,
allowStale: false,
});
cache.set('user:1', user);
const u = cache.get('user:1');Always set max and ttl. An unbounded Map() is the most common Node memory leak.
For multi-instance services, prefer Redis — in-process LRU per instance can cause cache divergence.
Answer:
node --cpu-prof --cpu-prof-dir=./profiles server.jsGenerates a .cpuprofile file you can load in Chrome DevTools (Performance tab, Load profile).
Programmatic:
const inspector = require('node:inspector/promises');
const fs = require('node:fs');
const session = new inspector.Session();
session.connect();
await session.post('Profiler.enable');
await session.post('Profiler.start');
// ... workload ...
const { profile } = await session.post('Profiler.stop');
fs.writeFileSync('profile.cpuprofile', JSON.stringify(profile));Answer:
JSON.parseandJSON.stringifyon large bodies- bcrypt, scrypt, argon2 password hashing (move to thread pool, tune cost)
- RegExp with backtracking
- Manual sort, filter, or map chains over huge arrays
- Templating (EJS, Pug)
- ORM overhead (Sequelize, TypeORM)
Profile to find your specific hot spot.
Answer:
const { performance, PerformanceObserver } = require('node:perf_hooks');
performance.mark('A');
doWork();
performance.mark('B');
performance.measure('A->B', 'A', 'B');
new PerformanceObserver((list) => {
list.getEntries().forEach((e) => {
console.log(e.name, e.duration, 'ms');
});
}).observe({ entryTypes: ['measure'] });Lightweight enough for production use on critical paths.
Answer:
For incoming connections:
const http = require('node:http');
const server = http.createServer(handler);
server.keepAliveTimeout = 65_000; // longer than load balancer's idle timeout
server.headersTimeout = 66_000;
server.listen(3000);For outgoing HTTP:
const http = require('node:http');
const agent = new http.Agent({
keepAlive: true,
maxSockets: 50,
maxFreeSockets: 10,
keepAliveMsecs: 30_000,
});
await fetch(url, { agent });Reusing TCP and TLS connections eliminates handshakes, saving 50-200 ms per call.
Answer:
server.requestTimeout = 30_000; // total request time limit
server.headersTimeout = 60_000; // time to receive headers
server.keepAliveTimeout = 65_000;
server.timeout = 0; // socket inactivity (use requestTimeout)Without these, a slow-loris client can hold a connection forever.
Answer:
Express:
app.use(express.json({ limit: '1mb' }));
app.use(express.urlencoded({ extended: true, limit: '1mb' }));Fastify:
const fastify = require('fastify')({ bodyLimit: 1_048_576 });Without limits, a single request with a 500 MB body can OOM your process.
Answer:
const http2 = require('node:http2');
const fs = require('node:fs');
const server = http2.createSecureServer({
key: fs.readFileSync('key.pem'),
cert: fs.readFileSync('cert.pem'),
});
server.on('stream', (stream, headers) => {
stream.respond({ 'content-type': 'text/html', ':status': 200 });
stream.end('<h1>Hello HTTP/2</h1>');
});Most production setups terminate TLS and HTTP/2 at the load balancer (ALB, Nginx, Caddy) and proxy HTTP/1.1 to Node. Simpler and equally fast.
Answer:
// Possible in-process
const compression = require('compression');
app.use(compression({ threshold: 1024 }));Better: terminate compression at the proxy (Nginx, ALB) where it is faster and frees Node CPU. For static responses, precompress at build time and serve with gzip_static and brotli_static.
Answer:
Approximate req/s for a "Hello World" on a single core:
| Framework | req/s |
|---|---|
| Bare http | ~70k |
| uWS.js | ~150k |
| Fastify | ~50k |
| Hono | ~60k |
| Koa | ~30k |
| Express | ~15k |
| NestJS (Express) | ~10k |
| NestJS (Fastify) | ~30k |
Numbers vary by version and benchmark setup. Real-world apps rarely max these — you are DB-bound first.
Answer:
- Schema-based serialization compiles JSON schema to a fast serializer (5-10x faster than
JSON.stringify) - Lighter middleware chain
- Built-in pino logging vs manual setup
- Plugin system avoids global pollution
- Avoids
req/resmutation overhead
// Fastify schema-based serialization
fastify.get('/user/:id', {
schema: {
response: {
200: {
type: 'object',
properties: {
id: { type: 'number' },
name: { type: 'string' },
email: { type: 'string' },
},
},
},
},
}, async (req) => {
return await getUser(req.params.id);
});Answer:
Most apps spend their time in the database, not the framework. A 200 ms endpoint will not get faster by switching frameworks.
Where it matters:
- High-throughput API gateways
- Real-time (WebSocket) servers
- Microservices with tight latency budgets
For typical CRUD apps, prioritize productivity over micro-throughput differences.
Answer:
Each middleware adds overhead. Stack only what you need.
// BAD: blanket auth on every route
app.use(authMiddleware);
// GOOD: scope to authenticated routes only
app.use('/api/private', authMiddleware);Other tips:
- Avoid
morganin production; use pino with structured logs - Do not enable JSON body parsing on routes that do not need it
- Skip CORS middleware on routes not requiring CORS
Answer:
const cluster = require('node:cluster');
const os = require('node:os');
const http = require('node:http');
if (cluster.isPrimary) {
for (let i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
cluster.on('exit', () => cluster.fork());
} else {
http.createServer(handler).listen(3000);
}Forks N workers, each on a separate core, sharing the listening port. Linear scaling for CPU.
In production, prefer PM2 or systemd template units instead of writing your own clustering code.
Answer:
// main.js
const Piscina = require('piscina');
const path = require('node:path');
const pool = new Piscina({
filename: path.resolve(__dirname, 'worker.js'),
minThreads: 2,
maxThreads: 8,
});
const result = await pool.run({ data: bigArray });// worker.js
module.exports = ({ data }) => {
return expensiveSyncWork(data);
};Use a worker pool — spinning up a thread per request is slow.
Use cases: image processing, PDF generation, large JSON parsing, crypto, ML inference.
Answer:
| Use case | Pick |
|---|---|
| Scale a stateless HTTP server across cores | Cluster |
| Run CPU-bound function without blocking event loop | Worker threads |
| Run a separate program | child_process |
| Share memory across threads | Worker threads with SharedArrayBuffer |
Cluster workers do not share memory; worker threads can.
Answer:
const { spawn } = require('node:child_process');
const child = spawn('ffmpeg', ['-i', 'input.mp4', 'output.mp4']);
child.stdout.on('data', (chunk) => console.log(chunk.toString()));
child.on('exit', (code) => console.log('exit', code));Use for shelling out to heavy tools (ffmpeg, imagemagick, pandoc). Never exec with user input — command injection risk.
Answer:
PM2:
pm2 start app.js -i max --name api
pm2 startup
pm2 savePM2 gives you cluster mode in one command, auto-restart on crash, log management, deployment hooks, web UI, and metrics.
Production-ready alternatives: systemd template units, container orchestrators (Kubernetes scales horizontally instead of clustering on one box).
Answer:
A common mistake: 100 connections per pool, then 4 instances = 400 connections to Postgres = exhaustion.
total_db_connections = pool_size * instance_count + headroom
total <= db_max_connections * 0.8
Practical: pool of 10 per instance, 4 instances = 40 connections. Plenty for most apps.
Answer:
// pg
const { Pool } = require('pg');
const pool = new Pool({
max: 10,
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 2_000,
statement_timeout: 5_000,
query_timeout: 5_000,
});
// mysql2
const mysql = require('mysql2/promise');
const pool = mysql.createPool({
connectionLimit: 10,
enableKeepAlive: true,
keepAliveInitialDelay: 0,
});
// mongoose
const mongoose = require('mongoose');
mongoose.connect(uri, {
maxPoolSize: 10,
serverSelectionTimeoutMS: 5000,
});Always set query timeouts. A hung query holds a connection until the DB times out (often minutes), starving the pool.
Answer:
Prisma, Sequelize, TypeORM, and Mongoose all add overhead compared to raw drivers. Often 2-5x for a simple query.
Mitigations:
// BAD: ORM hydration of a wide entity
const user = await prisma.user.findUnique({
where: { id },
include: { orders: true, profile: true },
});
// GOOD: select specific columns
const user = await prisma.user.findUnique({
where: { id },
select: { id: true, name: true, email: true },
});
// HOT PATH: raw query
const result = await prisma.$queryRaw`
SELECT id, name FROM users WHERE id = ${id}
`;Answer:
// pg: named queries are cached as prepared statements per connection
const result = await client.query({
name: 'get-user',
text: 'SELECT * FROM users WHERE id = $1',
values: [id],
});The pg driver caches prepared statements per connection. This saves planning time on repeated queries. Drivers vary, so verify your DB and library.
Answer:
const { LRUCache } = require('lru-cache');
const cache = new LRUCache({ max: 1000, ttl: 60_000 });Pros: fastest (sub-microsecond), no network. Cons: per-instance, divergence between instances, cannot be invalidated globally.
Best for: small reference data, hot read paths, frequently used config.
Answer:
const Redis = require('ioredis');
const redis = new Redis();
async function getUser(id) {
const key = `user:${id}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const user = await db.users.findOne({ id });
await redis.set(key, JSON.stringify(user), 'EX', 300);
return user;
}Use ioredis for clustering and pipelining. Use setex or set ... EX so unbounded keys do not accumulate.
Answer:
async function getOrSet(key, ttl, loader) {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const lockKey = `lock:${key}`;
const got = await redis.set(lockKey, '1', 'NX', 'EX', 5);
if (got) {
try {
const fresh = await loader();
await redis.set(key, JSON.stringify(fresh), 'EX', ttl);
return fresh;
} finally {
await redis.del(lockKey);
}
} else {
await new Promise((r) => setTimeout(r, 50));
return getOrSet(key, ttl, loader);
}
}Or use a battle-tested library like cache-manager.
Answer:
const etag = require('etag');
app.get('/api/products', async (req, res) => {
const body = await getProducts();
const tag = etag(JSON.stringify(body));
res.setHeader('Cache-Control', 'public, max-age=60, stale-while-revalidate=300');
res.setHeader('ETag', tag);
if (req.headers['if-none-match'] === tag) {
return res.status(304).end();
}
res.json(body);
});For public APIs, an ETag plus a CDN can offload 90 percent or more of read traffic.
Answer:
// SLOW: 3 sequential round trips
const a = await fetchA();
const b = await fetchB();
const c = await fetchC(); // total = sum of latencies
// FAST: parallel
const [a, b, c] = await Promise.all([fetchA(), fetchB(), fetchC()]);
// total = max latencyPromise.all rejects on first error. For partial failure tolerance:
const results = await Promise.allSettled([fetchA(), fetchB(), fetchC()]);
for (const r of results) {
if (r.status === 'fulfilled') console.log(r.value);
else console.error(r.reason);
}Answer:
A naive Promise.all over 10,000 items can spawn 10,000 concurrent DB queries — instant pool exhaustion.
const pLimit = require('p-limit');
const limit = pLimit(20);
await Promise.all(items.map((i) => limit(() => process(i))));Or batch:
for (let i = 0; i < items.length; i += 50) {
const batch = items.slice(i, i + 50);
await Promise.all(batch.map(process));
}Answer:
const ctrl = new AbortController();
const timeout = setTimeout(() => ctrl.abort(), 5000);
try {
const res = await fetch(url, { signal: ctrl.signal });
return await res.json();
} finally {
clearTimeout(timeout);
}Always set timeouts on outgoing HTTP. A hung third party can pile up requests until you OOM.
Answer:
const DataLoader = require('dataloader');
const userLoader = new DataLoader(async (ids) => {
const users = await db.users.find({ id: { $in: ids } });
return ids.map((id) => users.find((u) => u.id === id) || null);
});
// In handlers
await userLoader.load(1);
await userLoader.load(2);
await userLoader.load(3);
// All three ids batched into one DB query within the same tickDataLoader collapses N+1 by batching loads within a single event loop tick. Common in GraphQL.
Answer:
For events that fire fast (resize, scroll, keyboard, sensor data) — handle less often.
const { debounce, throttle } = require('lodash-es');
const onChange = debounce((value) => persist(value), 300);
const onScroll = throttle(handle, 100);For server-side rate limiting use bottleneck or token-bucket libraries.
Answer:
Loading a 1 GB CSV into memory:
// BAD: OOMs on large files
const data = require('node:fs').readFileSync('huge.csv');Streaming uses constant memory:
const { pipeline } = require('node:stream/promises');
const fs = require('node:fs');
const { parse } = require('csv-parse');
await pipeline(
fs.createReadStream('huge.csv'),
parse({ columns: true }),
async function* (source) {
for await (const row of source) {
yield JSON.stringify(row) + '\n';
}
},
fs.createWriteStream('out.ndjson'),
);Answer:
When the consumer is slower than the producer, streams pause the producer. Native streams handle this via .write() returning false and a 'drain' event.
// WRONG: ignores return value, fires as fast as possible
res.write(line);
// RIGHT: respect backpressure
const { once } = require('node:events');
if (!res.write(line)) {
await once(res, 'drain');
}Or use pipeline which handles backpressure for you.
Answer:
res.setHeader('Content-Type', 'application/x-ndjson');
res.setHeader('Transfer-Encoding', 'chunked');
for await (const row of db.queryStream('SELECT * FROM big_table')) {
res.write(JSON.stringify(row) + '\n');
}
res.end();Send large datasets without buffering. NDJSON or Server-Sent Events for streaming JSON.
Answer:
| Library | Speed | Notes |
|---|---|---|
| pino | Fastest (~5x winston) | Async, structured JSON |
| winston | Slower | Pluggable transports |
| bunyan | Older | JSON, similar to pino |
| console | Synchronous | Do not use in production |
console.log is synchronous — every call blocks the event loop.
const pino = require('pino');
const logger = pino();
logger.info({ user_id: 123 }, 'login successful');
logger.error({ err }, 'failed to send email');Answer:
const pino = require('pino');
const transport = pino.transport({
target: 'pino/file',
options: { destination: '/var/log/app.log' },
});
const logger = pino(transport);Logs are buffered and flushed asynchronously by a worker thread. About 10x throughput vs synchronous.
Answer:
// Sample noisy logs
if (Math.random() < 0.01) {
logger.info({ ...event }, 'sampled event');
}
// Use levels: info for normal, debug not in prod
logger.level = process.env.LOG_LEVEL || 'info';
// Structure logs for searchability
logger.info({ user_id, action }, 'action_taken'); // not "user 5 did X"Rotate logs with logrotate or use pino's transport with rotation. A logging-heavy app can spend 30 percent CPU on logging if misused.
Answer:
const { Queue, Worker } = require('bullmq');
const connection = { host: 'redis', port: 6379 };
const queue = new Queue('emails', { connection });
await queue.add('send', { to: 'a@b.c' }, {
attempts: 3,
backoff: { type: 'exponential', delay: 1000 },
});
new Worker('emails', async (job) => {
await sendEmail(job.data);
}, {
connection,
concurrency: 10,
});Features: retries, delayed jobs, repeatable jobs, rate limiting, priorities, flow (DAG), parent-child.
Answer:
- Concurrency per worker: balance throughput and downstream load
- Multiple workers across machines for horizontal scale
- Lock duration longer than max job time to avoid duplicate processing
- Stalled job detection — workers heartbeat every N seconds
- Job size: store payload references (S3 URL), not huge JSON in Redis
Answer:
- BullMQ (Redis): simple, fast, under 10k jobs/s per Redis instance
- SQS (AWS): managed, virtually unlimited, ~10s p99 latency
- RabbitMQ: rich routing (topic, fanout), strong delivery guarantees
- Kafka: event streams, replay, retention, very high throughput
Most apps start with BullMQ and move to Kafka only when log or event volume justifies it.
Answer:
If a user request fans out to 5 services synchronously, each adding 50 ms, you are at 250+ ms before the user sees anything.
Tactics:
- Parallel fan-out with timeouts and circuit breakers
- Request hedging (issue duplicate requests, use first response)
- Caching at the gateway layer
- gRPC instead of HTTP+JSON — 5-10x faster for cross-service calls
- Connection reuse (keep-alive, gRPC channels)
- Co-locate services in the same AZ (cross-AZ adds 1-2 ms)
Answer:
// user.proto
service UserService {
rpc GetUser (GetUserRequest) returns (User);
}
message GetUserRequest { int64 id = 1; }
message User { int64 id = 1; string name = 2; }// Node client
const grpc = require('@grpc/grpc-js');
const protoLoader = require('@grpc/proto-loader');
const def = protoLoader.loadSync('user.proto');
const proto = grpc.loadPackageDefinition(def);
const client = new proto.UserService('user-svc:50051', grpc.credentials.createInsecure());
client.getUser({ id: 42 }, (err, user) => console.log(user));Pros: binary protobuf is smaller and faster than JSON, HTTP/2 multiplexing, streaming, strict schemas with generated clients.
Cons: harder to debug (no curl), browser support requires gRPC-Web proxy.
Answer:
const CircuitBreaker = require('opossum');
const breaker = new CircuitBreaker(callExternalApi, {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 30_000,
});
breaker.fallback(() => ({ items: [] }));
const result = await breaker.fire(args);When the error rate exceeds the threshold, the breaker opens and fails fast. After the reset timeout, it goes half-open and tests one request before closing again. Prevents cascade failure when a downstream is down.
Answer:
async function withRetry(fn, attempts = 3) {
for (let i = 0; i < attempts; i++) {
try {
return await fn();
} catch (e) {
if (i === attempts - 1) throw e;
const backoff = 2 ** i * 100 + Math.random() * 100;
await new Promise((r) => setTimeout(r, backoff));
}
}
}Always include jitter to avoid thundering herd. Be careful retrying non-idempotent operations like POST without idempotency keys.
Libraries: cockatiel, p-retry, async-retry.
Answer:
| Format | Size vs JSON | Speed | Notes |
|---|---|---|---|
| MessagePack | ~30% smaller | 2x faster | @msgpack/msgpack |
| Protobuf | Smallest | Fastest | Schema-based |
| CBOR | Similar to msgpack | Similar | Standard |
| Avro | Schema-based | Fast | Schema evolution |
For client-facing APIs, JSON is usually fine — overhead is rarely the bottleneck.
Answer:
For 50+ MB JSON, JSON.stringify blocks the event loop for hundreds of ms.
// BAD: blocks the loop
res.json(hugeArray);
// GOOD: stream NDJSON
res.setHeader('Content-Type', 'application/x-ndjson');
const { once } = require('node:events');
for (const row of hugeArray) {
if (!res.write(JSON.stringify(row) + '\n')) {
await once(res, 'drain');
}
}
res.end();Or use Fastify's schema-based serializer (5-10x faster).
Answer:
A service hit OOM after 24 hours. Heap snapshot showed a Map growing forever.
Bad code:
const userCache = new Map();
function getUser(id) {
if (!userCache.has(id)) userCache.set(id, fetchUser(id));
return userCache.get(id);
}Fix:
const { LRUCache } = require('lru-cache');
const userCache = new LRUCache({
max: 10_000,
ttl: 5 * 60 * 1000,
});
function getUser(id) {
let user = userCache.get(id);
if (!user) {
user = fetchUser(id);
userCache.set(id, user);
}
return user;
}Memory plateaued at 800 MB.
Answer:
Latency p99 spiked from 50 ms to 2 s under load. Profile showed bcrypt.hashSync on the request path.
// BAD: blocks the loop
const hash = bcrypt.hashSync(password, 12);
// GOOD: uses libuv thread pool
const hash = await bcrypt.hash(password, 10);Fixes:
- Replace
bcrypt.hashSyncwith the async version - Set
UV_THREADPOOL_SIZE=16to allow more concurrent hashes - Lower bcrypt cost from 12 to 10 (still secure, much faster)
- Optionally move hashing to a worker pool for isolation
Answer:
A job processed 100k records, each with a DB call. It used Promise.all. Pool of 10 connections maxed out, queries timed out, downstream services flooded.
Bad:
await Promise.all(records.map((r) => process(r))); // 100k parallelFix:
const pLimit = require('p-limit');
const limit = pLimit(20);
await Promise.all(records.map((r) => limit(() => process(r))));A constant 20 concurrent DB queries, no exhaustion, no flood.
Answer:
An endpoint returning 100k records took 4 seconds. Profile showed JSON.stringify taking 3.2 seconds.
Solutions tried in order:
// 1. fast-json-stringify with schema -> 800 ms
const fastJson = require('fast-json-stringify');
const stringify = fastJson({
type: 'array',
items: {
type: 'object',
properties: {
id: { type: 'integer' },
name: { type: 'string' },
},
},
});
res.send(stringify(rows));// 2. Switch to NDJSON streaming -> 50 ms TTFB, 1.2s full transfer
res.setHeader('Content-Type', 'application/x-ndjson');
for await (const row of dbStream) {
res.write(JSON.stringify(row) + '\n');
}
res.end();- Eventually added pagination — the real fix.
Answer:
A downstream API got slow. The service retried 3 times, tripling the load on the downstream, making it slower, causing more retries. Cascading failure.
Fixes:
const breaker = new CircuitBreaker(downstreamCall, {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 30_000,
});
// Retry budget: cap at 10% of original requests
const RETRY_BUDGET = 0.1;
let retriesThisSecond = 0;
let requestsThisSecond = 0;
setInterval(() => { retriesThisSecond = 0; requestsThisSecond = 0; }, 1000);
async function callWithBudget(args) {
requestsThisSecond++;
try {
return await breaker.fire(args);
} catch (e) {
if (retriesThisSecond / requestsThisSecond < RETRY_BUDGET) {
retriesThisSecond++;
await new Promise((r) => setTimeout(r, 100 + Math.random() * 200));
return await breaker.fire(args);
}
throw e;
}
}Answer:
The ALB had idle timeout 60 s. Node's keepAliveTimeout was the default 5 s. Result: ALB sent requests to a closing socket, returning 502s.
Fix:
server.keepAliveTimeout = 65_000; // > ALB idle timeout
server.headersTimeout = 66_000; // must be > keepAliveTimeoutAnswer:
A service grew to 6 GB old space. Each GC pause was 800 ms. p99 latency was terrible.
Fixes:
- Find the largest retained objects (heap snapshot)
- Move large in-memory caches to Redis
- Lower
--max-old-space-sizeto force earlier GC and accept process restarts at 4 GB - Use multiple smaller processes (4 x 2 GB) instead of one big one
Smaller heaps mean faster GC pauses.
Answer:
- Server timeouts configured (request, headers, keep-alive)
- Body size limits set
- Outgoing HTTP has timeouts
- Connection pool sized correctly
- Graceful shutdown (SIGTERM, drain, close DB, exit)
- Health and readiness endpoints
- Cluster mode or container replicas
- PM2 or systemd or orchestrator manages restarts
- Async logging with pino
- Structured logs with correlation IDs
- Metrics exported (Prometheus, OpenTelemetry)
- Tracing for distributed flows
- Memory ceiling with
--max-old-space-sizematching container limit - No synchronous file or crypto on the request path
- bcrypt or crypto cost tuned
- Caches bounded (LRU + TTL)
- Circuit breaker on flaky downstreams
- Rate limiting (express-rate-limit, fastify-rate-limit, or Redis-backed)
- Stress-tested with autocannon or wrk before production
Answer:
- Measure — autocannon load test, RUM, find the actual bottleneck
- Quick wins — pool size, keep-alive, missing indexes (downstream), payload limits
- Caching — in-process LRU for hot keys, Redis for shared
- Reduce work — pagination, projection, schema-based serialization
- Async — queue non-critical work
- Horizontal scale — cluster on box, then more boxes
- Pre-fetch and batch — DataLoader for fan-out reads
- Move CPU work — worker threads, dedicated services, language change for hot paths
- Architecture — read replicas, CDN, edge functions
- Iterate — measure again, find the new bottleneck, repeat
Never optimize without a measurement.