feat: Request Rejection Frontend metrics by kthui · Pull Request #7644 · ai-dynamo/dynamo

kthui · 2026-03-26T00:23:48Z

Overview:

Add Request Rejection to Frontend metrics, and E2E tests.

Details:

Frontend Rejection Metrics:

When a request is rejected at the router, a DynamoError with a new type ResourceExhausted is returned.
Each frontend endpoint will check if an error chain contains the ResourceExhausted type.
- If so, the rejection metrics counter is incremented on the endpoint and model.
Note: Rejected requests are non-migratable.

# HELP dynamo_frontend_model_rejection_total Total number of requests rejected due to resource exhaustion
# TYPE dynamo_frontend_model_rejection_total counter
dynamo_frontend_model_rejection_total{endpoint="chat_completions",model="Qwen/Qwen3-0.6B"} 32

E2E test:

Enhanced the existing 503 router test to count the number of success/rejection among the requests sent.
The test also asserts the number of 503s matches the rejection metrics.

$ pytest router/test_router_e2e_with_mockers.py::test_mocker_kv_router_overload_503 -v
=========================================== test session starts ============================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /opt/dynamo/venv/bin/python3
cachedir: .pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace
configfile: pyproject.toml
plugins: ai-dynamo-1.0.0, cov-7.0.0, pytest_httpserver-1.1.3, order-1.3.0, asyncio-1.3.0, md-report-0.7.0, benchmark-5.2.3, xdist-3.8.0, forked-1.6.0, pytest_codeblocks-0.17.0, mock-3.15.1, dash-3.1.1, timeout-2.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                           

router/test_router_e2e_with_mockers.py::test_mocker_kv_router_overload_503[nondurable] PASSED        [100%]

============================================ 1 passed in 13.10s ============================================

Where should the reviewer start?

Start with ‎lib/runtime/src/metrics/prometheus_names.rs on the new metrics.
See lib/runtime/src/error.rs on the new ResourceExhausted error type.
See ‎lib/runtime/src/pipeline/network/egress/push_router.rs for where the new error type is reported on upon rejection.
See lib/llm/src/migration.rs where a rejected request is non-migratable.
See lib/llm/src/[grpc/http]/service/* where each endpoint looks for the ResourceExhausted error type in the error chain, and increment the rejection metrics if found.
See tests/router/common.py existing E2E test updated, and asserts the number of rejected requests matches the number on the metrics.
See tests/router/test_router_e2e_with_mockers.py where the test timeout is updated based on the new average test duration.
Regenerated: ‎lib/bindings/python/src/dynamo/prometheus_names.py.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Closes DIS-757

Summary by CodeRabbit

Release Notes

New Features
- Added model rejection metrics tracking across all service endpoints for improved visibility into request failures.
Improvements
- Enhanced detection and handling of resource exhaustion errors with improved error categorization.
- Improved system observability with per-endpoint rejection metrics for better monitoring and diagnostics.

coderabbitai · 2026-03-26T00:54:18Z

Walkthrough

This pull request introduces infrastructure and implementation for tracking model request rejections across gRPC and HTTP services. It adds a new ResourceExhausted error type, rejection metric counters, detection helpers, and updates multiple service handlers to record rejection metrics when requests encounter resource exhaustion.

Changes

Cohort / File(s)	Summary
Error Type Infrastructure `lib/runtime/src/error.rs`, `lib/llm/src/migration.rs`	Added `ResourceExhausted` variant to `ErrorType` enum with Display formatting. Updated `is_migratable` to explicitly classify `ResourceExhausted` errors as non-migratable, preventing migration on that error type.
Metrics Infrastructure `lib/runtime/src/metrics/prometheus_names.rs`, `lib/bindings/python/src/dynamo/prometheus_names.py`	Added `MODEL_REJECTION_TOTAL` constant to frontend service metrics. Added cancellation metrics to Python bindings and removed unused transport TCP/NATS nested classes.
HTTP Metrics Service `lib/llm/src/http/service/metrics.rs`	Introduced `request_was_rejected()` helper function, new `model_rejection_total` counter field to `Metrics` struct, registration logic in `Metrics::new()` and `Metrics::register()`, and public methods `inc_rejection()` and `get_rejection_count()`.
gRPC Service Rejection Tracking `lib/llm/src/grpc/service/openai.rs`, `lib/llm/src/grpc/service/tensor.rs`	Captured `model_name` from request and added conditional rejection metric increments using `request_was_rejected()` check before returning errors. Updated `CancellationLabels` to use captured model name.
HTTP Service Rejection Tracking `lib/llm/src/http/service/openai.rs`, `lib/llm/src/http/service/anthropic.rs`	Added rejection detection and metric increments across multiple endpoints (completions, embeddings, chat_completions, etc.). Refactored error mapping to check for rejections before constructing error responses. Updated test case to construct `DynamoError` with `ErrorType::ResourceExhausted`.
Error Construction `lib/llm/src/pipeline/network/egress/push_router.rs`	Changed "all workers busy" error path to wrap `PipelineError::ServiceOverloaded` inside `DynamoError` with `ErrorType::ResourceExhausted`.
Test Infrastructure `tests/router/common.py`	Added `_parse_frontend_rejection_metric()` and `_verify_frontend_rejection_metrics()` helpers to scrape and validate rejection metrics. Refactored `_test_router_overload_503()` with parameterized concurrent request counts, status aggregation, and frontend metrics verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Request Rejection Frontend metrics' clearly and concisely summarizes the main change: adding frontend metrics for request rejections.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description comprehensively covers overview, implementation details, review guidance, and references a related issue. All required template sections are present and well-populated.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/router/common.py (1)
601-605: Use the generated Prometheus name here.

This PR already regenerates lib/bindings/python/src/dynamo/prometheus_names.py, so hardcoding dynamo_frontend_model_rejection_total leaves this helper with a second source of truth for the same contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/router/common.py` around lines 601 - 605, Replace the hardcoded metric
name "dynamo_frontend_model_rejection_total" in tests/router/common.py with the
generated constant from the regenerated binding (import the module
lib.bindings.python.src.dynamo.prometheus_names as prometheus_names) and use
prometheus_names.DYNAMO_FRONTEND_MODEL_REJECTION_TOTAL (or the exact exported
constant for that metric) when matching lines in metrics_text so the test uses
the single source of truth for the Prometheus name; keep the existing checks
using model_name and endpoint intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 1949-1954: The mapped errors from engine.generate(...) in the
media handlers (e.g., the images path shown and the other occurrences in
videos() and video_stream()) increment model_rejection_total but do not notify
the InflightGuard, causing requests_total to record the default "internal"
error; fix by calling the inflight guard's mark_error(...) with the same mapped
ErrorMessage before returning the Err (i.e., ensure the mapping closure or the
error-path calls inflight.mark_error(&err) or inflight.mark_error(err.clone())
as appropriate, then return ErrorMessage::from_anyhow(...) so the inflight sees
the propagated 503/overload error), applying this pattern for the generate()
calls at the referenced sites.

In `@tests/router/common.py`:
- Around line 629-635: The except block that catches requests.RequestException
when calling requests.get(metrics_url) loses the original traceback by
re-raising an AssertionError without chaining; update the exception raise to
preserve chaining by raising the AssertionError from the caught exception (use
"from e") so the original RequestException and its traceback (from the
requests.get/metrics_response.raise_for_status path) are preserved for
debugging.
- Around line 722-724: Replace the blanket "except Exception" that logs and
returns (req_id, -1) so it only catches the specific transport-related
exceptions you expect (e.g., ConnectionError, TimeoutError, or your project's
transport-specific exception class) in the request-handling block where logger
and req_id are used; leave AssertionError and other unexpected exceptions
unhandled so they propagate and fail the test, and keep the logging of the
transport exception message as before.

---

Nitpick comments:
In `@tests/router/common.py`:
- Around line 601-605: Replace the hardcoded metric name
"dynamo_frontend_model_rejection_total" in tests/router/common.py with the
generated constant from the regenerated binding (import the module
lib.bindings.python.src.dynamo.prometheus_names as prometheus_names) and use
prometheus_names.DYNAMO_FRONTEND_MODEL_REJECTION_TOTAL (or the exact exported
constant for that metric) when matching lines in metrics_text so the test uses
the single source of truth for the Prometheus name; keep the existing checks
using model_name and endpoint intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a211183f-695c-477d-af49-40a18e7d08d2

📥 Commits

Reviewing files that changed from the base of the PR and between db14d63 and 1d9c9c3.

📒 Files selected for processing (11)

lib/bindings/python/src/dynamo/prometheus_names.py
lib/llm/src/grpc/service/openai.rs
lib/llm/src/grpc/service/tensor.rs
lib/llm/src/http/service/anthropic.rs
lib/llm/src/http/service/metrics.rs
lib/llm/src/http/service/openai.rs
lib/llm/src/migration.rs
lib/runtime/src/error.rs
lib/runtime/src/metrics/prometheus_names.rs
lib/runtime/src/pipeline/network/egress/push_router.rs
tests/router/common.py

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

kthui self-assigned this Mar 26, 2026

pull-request-size Bot added the size/L label Mar 26, 2026

github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 26, 2026

kthui marked this pull request as ready for review March 26, 2026 00:42

kthui requested a review from a team as a code owner March 26, 2026 00:42

kthui requested a review from a team March 26, 2026 00:42

kthui requested a review from a team as a code owner March 26, 2026 00:42

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/openai.rs

Comment thread tests/router/common.py Outdated

Comment thread tests/router/common.py Outdated

copy-pr-bot Bot temporarily deployed to GITLAB March 26, 2026 01:05 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 26, 2026 01:09 Inactive

coderabbitai Bot mentioned this pull request Mar 26, 2026

fix: Call inflight.mark_error() in media handler generate() error paths (images, videos, video_stream) #7645

Closed

kthui added 2 commits March 30, 2026 14:18

feat: Frontend Request Rejection Metrics

1d86b0b

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

chore: Regenerate prometheus_names.py

1415039

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

kthui force-pushed the jacky-ft-reject-metrics branch from 4906a20 to a17aa05 Compare March 30, 2026 21:20

copy-pr-bot Bot temporarily deployed to GITLAB March 30, 2026 21:20 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 30, 2026 21:21 Inactive

test: Router overload 503 test to check for frontend metrics

4462bea

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

kthui force-pushed the jacky-ft-reject-metrics branch from a17aa05 to 4462bea Compare March 30, 2026 21:48

copy-pr-bot Bot temporarily deployed to GITLAB March 30, 2026 21:48 Inactive

jh-nv reviewed Mar 30, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/metrics.rs Outdated

jh-nv reviewed Mar 30, 2026

View reviewed changes

Comment thread lib/llm/src/grpc/service/openai.rs

kthui added 3 commits March 31, 2026 12:13

refactor: gRPC frontend to return resource exhausted error on rejection

2b4abbc

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

refactor: Rejection metrics endpoint to use pre-defined enum

19fdd01

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

chore: Restore manual edits at prometheus_names.py

07bde32

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 19:33 Inactive

Merge branch 'main' into jacky-ft-reject-metrics

6b4640a

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 20:07 Inactive

kthui requested a review from jh-nv March 31, 2026 20:07

kthui enabled auto-merge (squash) March 31, 2026 20:07

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 20:50 Inactive

Merge branch 'main' into jacky-ft-reject-metrics

077ad6f

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 21:51 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 1, 2026 00:08 Inactive

kthui disabled auto-merge April 6, 2026 18:08

jh-nv reviewed Apr 6, 2026

View reviewed changes

Comment thread tests/router/common.py Outdated

jh-nv approved these changes Apr 6, 2026

View reviewed changes

kthui added 2 commits April 6, 2026 18:44

Merge branch 'main' into jacky-ft-reject-metrics

8242144

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

test: Use constant metrics name on router 503 test

37d932d

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB April 7, 2026 01:50 Inactive

Merge branch 'main' into jacky-ft-reject-metrics

ff866d2

Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>

copy-pr-bot Bot temporarily deployed to GITLAB April 7, 2026 01:54 Inactive

kthui enabled auto-merge (squash) April 7, 2026 01:55

kthui merged commit 3205e7d into main Apr 7, 2026
93 checks passed

kthui deleted the jacky-ft-reject-metrics branch April 7, 2026 02:39

copy-pr-bot Bot had a problem deploying to GITLAB April 7, 2026 02:43 Failure

kthui mentioned this pull request Apr 13, 2026

docs: Request rejection metrics #8139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Request Rejection Frontend metrics#7644

feat: Request Rejection Frontend metrics#7644
kthui merged 11 commits intomainfrom
jacky-ft-reject-metrics

kthui commented Mar 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kthui commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kthui commented Mar 26, 2026 •

edited

Loading

coderabbitai Bot commented Mar 26, 2026 •

edited

Loading