-
Notifications
You must be signed in to change notification settings - Fork 0
04 Backpressure Strategies
Keywords: Backpressure, Flow Control, Reactive Streams, Demand Signaling, Bounded Queues, Rate Limiting, Adaptive Throttling, Queue Saturation, Event Loop Protection, Load Shedding, Circuit Breaker, Retry Storms, Bulkheading, Tail Latency, Throughput Stability, Overload Control, Netty, Java NIO, OP_WRITE, java.util.concurrent.Flow
Backpressure is one of the most important ideas in systems engineering.
It is the mechanism that prevents a fast producer from overwhelming a slow consumer.
In simple terms: Producer > Consumer creates danger.
If a system accepts more work than it can safely process, the result is usually:
- queue growth
- memory growth
- latency spikes
- thread starvation
- timeout cascades
- cascading failures
OutOfMemoryError- unstable tail latency
Backpressure is how a system says:
Stop.
Slow down.
Wait.
Drop.
Defer.
Route elsewhere.
A high-performance Java system is not just a system that can process a lot of work.
It is a system that can survive overload without collapsing.
That is the purpose of backpressure.
This page is the final control layer of the 04 series:
-
04-Performance-Overviewexplains how to measure and reason about performance. -
04-Event-Loop-Designexplains how to coordinate readiness and dispatch. -
04-Backpressure-Strategiesexplains how to keep the whole system stable when demand exceeds capacity.
Backpressure doesn't start in Java; it starts in the Operating System.
-
TCP Window Size: This is the ultimate low-level backpressure. When the consumer's kernel buffer is full, it advertises a
Window Sizeof 0 to the producer, pausing the physical data stream at the network level. - Mechanical Sympathy: High-performance Java systems must respect this. If the OS says "Stop" via TCP signals, but your Java app ignores this and keeps pulling data from a database, you are creating a memory bomb.
Backpressure is the controlled resistance a system applies when input arrives faster than output can be processed.
It is not a single feature.
It is a family of mechanisms.
Backpressure can be implemented at many layers:
- network layer
- socket layer
- event loop layer
- queue layer
- executor layer
- application layer
- database layer
- API gateway layer
- message broker layer
The core idea is always the same: Do not accept unlimited work.
A system without backpressure is like a pipe with no valve.
When pressure rises, the pipe bursts.
A system with backpressure is like a pipe with a regulator.
It can absorb load, slow input, or reject work deliberately.
Modern systems are full of mismatched speeds.
Examples:
- the network can deliver requests faster than the CPU can process them
- the event loop can accept more messages than the database can store
- the API gateway can receive more traffic than downstream services can serve
- a producer can enqueue tasks faster than worker threads can drain them
- a client can retry faster than the system can recover
Without backpressure, the fastest component dictates the failure mode of the slowest component.
That is dangerous.
Backpressure exists to preserve:
- stability
- fairness
- bounded memory
- predictable latency
- graceful degradation
- service survivability
Most overload failures follow the same chain:
Demand increases ➞ Queue grows ➞ Latency increases ➞ Retries increase ➞ More work arrives ➞
➞ System gets slower ➞ Timeouts increase ➞ More retries happen ➞ Collapse
Visual 1.1: The Overload Spiral — demand increase leading to queue growth, retries, and eventual collapse.
This is the classic overload spiral. Backpressure is how you break the spiral early.
Backpressure only makes sense when you distinguish demand from capacity.
| Term | Meaning |
|---|---|
| Demand | Incoming work the system wants to accept |
| Capacity | Work the system can safely process |
| Excess Demand | Demand beyond safe capacity |
A healthy system matches demand to capacity.
A broken system tries to accept everything.
That is how queues grow uncontrollably.
When demand exceeds capacity, a system can do one of three things:
Tell producers to reduce their rate.
Examples:
- rate limiting
- demand signaling
- window-based flow control
- caller-side throttling
Use a bounded queue to absorb short bursts.
Examples:
- bounded work queues
- ring buffers
- in-memory buffers with limits
Refuse additional work intentionally.
Examples:
- HTTP 429
- queue rejection
- circuit breaker open state
- dropping low-priority events
A good system often uses all three, depending on the situation.
These terms are related, but not identical.
| Term | Meaning |
|---|---|
| Backpressure | Slowing producers when consumers are saturated |
| Flow Control | Managing how much data is allowed to move |
| Load Shedding | Dropping work intentionally to protect the system |
In practice:
- backpressure is the umbrella concept
- flow control is a structured form of backpressure
- load shedding is a last-resort response
A production system often has this shape:
Client ➞ Gateway ➞ Event Loop ➞ Bounded Queue ➞ Worker Pool ➞ Business Logic ➞ Database / Downstream Service
Every arrow is a possible bottleneck.
-
If the database slows down, the worker pool slows down.
-
If the worker pool slows down, the queue grows.
-
If the queue grows, latency grows.
-
If latency grows, clients retry.
-
If clients retry, the gateway sees more traffic.
-
Without backpressure, the whole pipeline becomes unstable.
Backpressure is not just an engineering trick.
It is a feedback control system.
It has three parts:
- measurement
- decision
- action
The system observes:
- queue depth
- response time
- rejection rate
- CPU usage
- memory usage
- downstream latency
- saturation signals
The system decides whether to:
- accept more work
- slow down
- defer
- reject
- shed load
- reroute
The system acts through:
- rate limits
- queue bounds
- task rejection
- connection throttling
- priority policies
- circuit breaker transitions
Backpressure is control, not just storage.
One of the most common mistakes in Java systems is using unbounded queues.
Example:
new LinkedBlockingQueue<>()At first, this looks safe.
But under overload, it hides the problem.
What happens:
- producers keep submitting
- queue grows silently
- memory usage rises
- latency increases
- old requests become stale
- eventually the JVM collapses
Unbounded queues do not solve overload. They delay the crash. That is worse in many cases because the system appears healthy until it is too late.
A bounded queue is one of the simplest and strongest backpressure tools.
Example:
new ArrayBlockingQueue<>(1000)This provides a hard cap.
Behavior:
- when the queue is not full, work is accepted
- when the queue is full, the system must decide what to do next
That decision is the backpressure policy.
Benefits:
- bounded memory
- predictable behavior
- overload visibility
- better latency control
A bounded queue says: I will not absorb infinite pain.
That is a good thing.
Thread pools are one of the most important places to enforce backpressure.
A thread pool has limited capacity:
- worker threads
- queue slots
- scheduling budget
When all workers are busy and the queue is full, the pool is saturated.
At that point, the system must apply pressure back to the producer.
This can happen through:
- rejection
- caller-runs behavior
- throttling
- delayed submission
- upstream slowdown
A thread pool without backpressure becomes a latency bomb.
One of the most elegant backpressure strategies in Java is CallerRunsPolicy.
Behavior:
- if the pool is full, the submitting thread runs the task itself
This is powerful because:
- the producer slows down naturally
- overload propagates upstream
- the system avoids infinite queue growth
Example:
ThreadPoolExecutor executor =
new ThreadPoolExecutor(
8,
16,
60,
TimeUnit.SECONDS,
new ArrayBlockingQueue<>(1000),
new ThreadPoolExecutor.CallerRunsPolicy()
);Why it works:
- the producer now pays the cost of submission
- the caller is no longer free to flood the system
- throughput becomes self-regulating
This is one of the best built-in Java backpressure mechanisms.
Many developers treat rejection as an error.
In reality, rejection is often a correct and necessary response to overload.
A healthy system sometimes must say:
No.
Examples:
- HTTP 429 Too Many Requests
- queue full
- worker saturation
- circuit breaker open
- shed low-priority traffic
Rejection protects:
- availability
- latency
- memory
- downstream dependencies
Without rejection, the system may accept work it cannot safely process.
That is worse than refusing it early.
Rate limiting is a proactive form of backpressure.
Instead of waiting for overload, the system caps input rate before overload occurs.
Common forms:
- requests per second
- tokens per interval
- burst limits
- per-user limits
- per-IP limits
- per-tenant limits
Rate limiting is useful when you want to protect:
- API gateways
- shared infrastructure
- downstream services
- premium tiers
- multi-tenant systems
It is a front-line defense.
Two classic traffic shaping patterns are widely used in backpressure systems.
Tokens accumulate at a fixed rate.
- a request consumes a token
- if no token exists, the request is delayed or rejected
Good for:
- burst tolerance
- smoothing irregular traffic
- allowing short spikes
Work is drained at a fixed rate.
- incoming traffic enters a bucket
- output is released steadily
- excess work is dropped or delayed
Good for:
- smoothing output
- enforcing stable throughput
- preventing burst-driven chaos
These models are widely used in networking and rate control.
Visual 1.2: Token Bucket allows bursts with controlled refill; Leaky Bucket enforces steady outflow.
Event loops are especially sensitive to overload.
Why?
Because a single loop often coordinates many connections.
If the loop accepts too much work:
- dispatch slows down
- queues grow
- latency rises
- fairness collapses
- the loop may become unresponsive
Backpressure in event loops typically includes:
- bounded outbound buffers
- limited per-connection writes
- toggling
OP_WRITE - rejecting excessive tasks
- offloading to worker pools
- throttling producers
- limiting read batch size
An event loop must never become a dumping ground for unlimited work.
Netty manages backpressure through the ChannelConfig using Watermarks. This is the bridge between the Event Loop and the hardware.
-
High Watermark (e.g., 64KB): If the outbound buffer exceeds this,
channel.isWritable()becomesfalse. The producer must stop sending data. -
Low Watermark (e.g., 32KB): Once the buffer drains below this,
channel.isWritable()becomestrueagain, signaling it's safe to resume.
isWritable() is the #1 cause of Netty-based OutOfMemoryError.
OP_WRITE is one of the most important places where backpressure appears in NIO.
A socket is often writable.
If write readiness stays enabled all the time:
- the selector wakes up repeatedly
- the loop burns CPU
- unnecessary write checks happen
- throughput may degrade
Correct strategy:
- enable write interest only when outbound data exists
- disable it after flushing the buffer
This is backpressure at the transport level. It prevents the system from doing pointless work.
Backpressure is not only about queues and sockets.
Downstream systems matter too:
- databases
- message brokers
- external APIs
- file systems
- caches
- search engines
If the downstream becomes slow, the upstream must respond.
Otherwise, the system will keep piling up work that cannot be completed.
Common downstream backpressure tools include:
- circuit breakers
- timeouts
- bounded retries
- bulkheads
- concurrency caps
- async queue limits
One of the most dangerous overload patterns is the retry storm.
What happens:
- a downstream service slows down
- clients time out
- clients retry immediately
- load increases further
- the service slows down more
- more retries happen
This creates a positive feedback loop.
Backpressure must be paired with:
- retry limits
- exponential backoff
- jitter
- circuit breakers
- idempotency controls
Retries without backpressure are a self-inflicted denial of service.
Visual 1.3: Retry Storm — Retries amplify overload, creating a feedback loop that accelerates system failure.
A circuit breaker is a higher-level overload protection strategy.
It stops requests from hitting a failing dependency when the failure rate is too high.
State model:
- Closed: requests flow normally
- Open: requests are rejected fast
- Half-Open: test traffic is allowed
Circuit breakers protect the system from:
- retry storms
- resource exhaustion
- cascading failures
- long waiting chains
They are not a replacement for backpressure. They are a complementary layer.
Bulkheads isolate failure domains. If one subsystem overloads, it should not consume all resources.
Examples:
- separate thread pools
- separate queues
- separate connection pools
- separate rate limits
- separate worker groups
This is especially useful in large systems where:
- one tenant is noisy
- one endpoint is hot
- one downstream is slow
- one batch job is expensive
Bulkheads turn a system-wide failure into a localized failure. That is a major stability win.
If you cannot measure overload, you cannot control it.
Important signals include:
| Metric | Meaning |
|---|---|
| Queue Depth | How much work is waiting |
| Rejection Count | How often overload is occurring |
| Latency Percentiles | Whether tail latency is rising |
| Active Threads | Whether the pool is saturated |
| CPU Usage | Whether the system is compute-bound |
| Memory Usage | Whether queues or buffers are growing |
| Timeout Rate | Whether downstream systems are failing |
| Retry Rate | Whether clients are amplifying load |
| Drop Rate | Whether load shedding is occurring |
Backpressure is only useful if it is observable.
Don't just watch "CPU". Monitor these saturation signals to detect backpressure issues:
-
executor_queued_tasks: Indicates if the worker pool is becoming a bottleneck. -
netty_eventloop_pending_tasks: Shows when the Event Loop is struggling to keep up. -
http_server_requests_active: Reveals request pile‑ups at the gateway level. -
jvm_memory_direct_bytes: Signals Netty’s off‑heap buffers growing due to write stalls.
Static limits are useful, but adaptive limits are often better. Adaptive backpressure changes behavior based on runtime conditions.
Examples:
- increase throttling when queue depth rises
- reduce concurrency when downstream latency worsens
- loosen limits when the system recovers
- switch policies during overload
A conceptual snippet of how a system might reactively reduce its concurrency limit when downstream latency spikes:
public void adjustConcurrencyLimit(long currentLatencyMs) {
if (currentLatencyMs > LATENCY_THRESHOLD_MS) {
// Downstream is struggling, back off immediately
currentLimit = Math.max(MIN_LIMIT, (int)(currentLimit * 0.8));
logger.warn("Latency spike detected ({}ms). Scaling back concurrency to {}", currentLatencyMs, currentLimit);
} else {
// System is healthy, gradually increase capacity
currentLimit = Math.min(MAX_LIMIT, currentLimit + 1);
}
}A simple adaptive backpressure mechanism can dynamically reduce throughput when latency rises:
double latencyMs = metrics.getP99Latency();
if (latencyMs > 500) {
subscription.request(5); // reduce demand
} else {
subscription.request(20); // normal flow
}Adaptive systems can be more stable because they react to real conditions rather than fixed assumptions. But they must be designed carefully. A bad adaptive system can oscillate or become unstable.
Sometimes the right answer is to drop work. This sounds harsh, but it is often the correct engineering choice.
Load shedding can mean:
- dropping low-priority requests
- refusing new work when capacity is exhausted
- discarding stale events
- skipping non-critical updates
- compressing work into summaries
Why?
Because a partially degraded service is often better than a fully collapsed one. A system that can shed load gracefully is more resilient than a system that tries to do everything and fails completely.
The primary goal of graceful degradation is user experience continuity. Instead of a hard failure, the system provides a functional but limited alternative, ensuring the user is never left with a broken white page.
A good system does not fail all at once. It degrades gracefully.
Examples:
- reduced feature set
- lower update frequency
- delayed processing
- partial responses
- fallback paths
- cached results
- reduced precision
- prioritized traffic
Graceful degradation is backpressure applied at the product and service layer. It keeps the system useful under stress.
They hide overload until memory fails.
They amplify the original overload.
They freeze unrelated work.
They push the problem to a later stage.
Important work gets buried under low-value work.
The system accepts work it cannot safely process.
The system fails before operators know what happened.
Reactive Streams formalize backpressure through demand signaling. The consumer tells the producer how much it can handle. This is the purest form of backpressure in application design.
Conceptually:
Consumer requests N items
Producer sends up to N items
Consumer asks again when ready
This model prevents:
- flooding
- unbounded buffering
- unnecessary pressure on consumers
Reactive Streams are built around the idea that demand should be explicit, not assumed.
For developers using reactive libraries, these operators provide out-of-the-box strategies to handle overflowing producers:
| Operator | Strategy | Best For |
|---|---|---|
.onBackpressureBuffer() |
BUFFER | Small spikes where data loss is unacceptable. |
.onBackpressureDrop() |
DROP | Real-time telemetry where "old" is "useless". |
.onBackpressureLatest() |
LATEST | UI Updates / Price Tickers (only newest value matters). |
.onBackpressureError() |
FAIL | Situations where overload should be treated as a fatal state. |
Java 9 standardizes backpressure via java.util.concurrent.Flow. The heart of this is the request(n) call.
// The "Pull-Push" Hybrid in Action
public void onSubscribe(Subscription subscription) {
this.subscription = subscription;
// Consumer tells Producer: "I have capacity for 10 items right now"
subscription.request(10);
}In event-driven systems, backpressure is essential because events can arrive faster than they can be processed.
Examples:
- websocket bursts
- message broker spikes
- IoT telemetry floods
- API surges
- log ingestion bursts
Event-driven systems must decide:
- which events to keep
- which to delay
- which to drop
- which to prioritize
Without that decision, they become unstable under load.
Different systems need different strategies.
| Situation | Best Strategy |
|---|---|
| Short bursts | Bounded queue |
| Sustained overload | Rate limiting |
| Critical dependency failure | Circuit breaker |
| Noisy tenant isolation | Bulkheading |
| Producer too fast | Caller-runs or throttling |
| Non-critical telemetry | Load shedding |
| User-facing API | Fast rejection with clear error |
| Streaming pipeline | Demand signaling / Reactive backpressure |
The right strategy depends on the workload and failure mode.
A robust architecture often looks like this:
Client ➞ Rate Limiter ➞ Event Loop ➞ Bounded Queue ➞ Worker Pool
➞ Circuit Breaker / Timeout Layer ➞ Downstream Service
Visual 1.4: Production Backpressure Pipeline — Client → Gateway → Event Loop → Queue → Worker Pool → Database.
Note: Each stage in this pipeline applies a different form of resistance — from rate limiting at the edge to circuit breaking at the core — ensuring systemic stability even when individual components struggle under load.
Each layer has a role. If any layer is missing, overload can leak through the system.
Backpressure is foundational in:
- API gateways
- stream processors
- message brokers
- websocket servers
- trading systems
- distributed databases
- ingress controllers
- reactive systems
- data pipelines
- observability platforms
Any system that receives work faster than it can safely complete that work needs backpressure. That means almost every serious production system.
Continue exploring:
- 04-Event-Loop-Design
- 04-Performance-Overview
- 02-Thread-Pool-Mechanics
- 02-ExecutorService-Internals
- 01-NIO-Selector-Architecture
- 01-NIO-Blocking-vs-NonBlocking
A robust production backpressure chain:
Client (Rate Limited) → Gateway (Throttled) → Event Loop (Watermarks) → Queue (Bounded) → Worker (Caller-Runs) → DB (Circuit Breaker)
When designing a system, ask yourself these three questions:
- Capacity: If downstream slows down, how fast does my internal queue fill?
- Signaling: When the queue is full, can I send a "STOP" signal to the Producer?
- Strategy: If the Producer does not stop, which data can I sacrifice? (Drop, Buffer, or Fail?)
Backpressure is the difference between a system that merely receives traffic and a system that survives traffic.
It protects:
- memory
- latency
- fairness
- downstream dependencies
- user experience
- service availability