Skip to content

feat: Request Rejection Frontend metrics#7644

Merged
kthui merged 11 commits intomainfrom
jacky-ft-reject-metrics
Apr 7, 2026
Merged

feat: Request Rejection Frontend metrics#7644
kthui merged 11 commits intomainfrom
jacky-ft-reject-metrics

Conversation

@kthui
Copy link
Copy Markdown
Contributor

@kthui kthui commented Mar 26, 2026

Overview:

Add Request Rejection to Frontend metrics, and E2E tests.

Details:

Frontend Rejection Metrics:

  • When a request is rejected at the router, a DynamoError with a new type ResourceExhausted is returned.
  • Each frontend endpoint will check if an error chain contains the ResourceExhausted type.
    • If so, the rejection metrics counter is incremented on the endpoint and model.
  • Note: Rejected requests are non-migratable.
# HELP dynamo_frontend_model_rejection_total Total number of requests rejected due to resource exhaustion
# TYPE dynamo_frontend_model_rejection_total counter
dynamo_frontend_model_rejection_total{endpoint="chat_completions",model="Qwen/Qwen3-0.6B"} 32

E2E test:

  • Enhanced the existing 503 router test to count the number of success/rejection among the requests sent.
  • The test also asserts the number of 503s matches the rejection metrics.
$ pytest router/test_router_e2e_with_mockers.py::test_mocker_kv_router_overload_503 -v
=========================================== test session starts ============================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /opt/dynamo/venv/bin/python3
cachedir: .pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /workspace
configfile: pyproject.toml
plugins: ai-dynamo-1.0.0, cov-7.0.0, pytest_httpserver-1.1.3, order-1.3.0, asyncio-1.3.0, md-report-0.7.0, benchmark-5.2.3, xdist-3.8.0, forked-1.6.0, pytest_codeblocks-0.17.0, mock-3.15.1, dash-3.1.1, timeout-2.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                           

router/test_router_e2e_with_mockers.py::test_mocker_kv_router_overload_503[nondurable] PASSED        [100%]

============================================ 1 passed in 13.10s ============================================

Where should the reviewer start?

  1. Start with ‎lib/runtime/src/metrics/prometheus_names.rs on the new metrics.
  2. See lib/runtime/src/error.rs on the new ResourceExhausted error type.
  3. See ‎lib/runtime/src/pipeline/network/egress/push_router.rs for where the new error type is reported on upon rejection.
  4. See lib/llm/src/migration.rs where a rejected request is non-migratable.
  5. See lib/llm/src/[grpc/http]/service/* where each endpoint looks for the ResourceExhausted error type in the error chain, and increment the rejection metrics if found.
  6. See tests/router/common.py existing E2E test updated, and asserts the number of rejected requests matches the number on the metrics.
  7. See tests/router/test_router_e2e_with_mockers.py where the test timeout is updated based on the new average test duration.
  8. Regenerated: ‎lib/bindings/python/src/dynamo/prometheus_names.py.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Closes DIS-757

Summary by CodeRabbit

Release Notes

  • New Features

    • Added model rejection metrics tracking across all service endpoints for improved visibility into request failures.
  • Improvements

    • Enhanced detection and handling of resource exhaustion errors with improved error categorization.
    • Improved system observability with per-endpoint rejection metrics for better monitoring and diagnostics.

@kthui kthui self-assigned this Mar 26, 2026
@github-actions github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 26, 2026
@kthui kthui marked this pull request as ready for review March 26, 2026 00:42
@kthui kthui requested a review from a team as a code owner March 26, 2026 00:42
@kthui kthui requested a review from a team March 26, 2026 00:42
@kthui kthui requested a review from a team as a code owner March 26, 2026 00:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Walkthrough

This pull request introduces infrastructure and implementation for tracking model request rejections across gRPC and HTTP services. It adds a new ResourceExhausted error type, rejection metric counters, detection helpers, and updates multiple service handlers to record rejection metrics when requests encounter resource exhaustion.

Changes

Cohort / File(s) Summary
Error Type Infrastructure
lib/runtime/src/error.rs, lib/llm/src/migration.rs
Added ResourceExhausted variant to ErrorType enum with Display formatting. Updated is_migratable to explicitly classify ResourceExhausted errors as non-migratable, preventing migration on that error type.
Metrics Infrastructure
lib/runtime/src/metrics/prometheus_names.rs, lib/bindings/python/src/dynamo/prometheus_names.py
Added MODEL_REJECTION_TOTAL constant to frontend service metrics. Added cancellation metrics to Python bindings and removed unused transport TCP/NATS nested classes.
HTTP Metrics Service
lib/llm/src/http/service/metrics.rs
Introduced request_was_rejected() helper function, new model_rejection_total counter field to Metrics struct, registration logic in Metrics::new() and Metrics::register(), and public methods inc_rejection() and get_rejection_count().
gRPC Service Rejection Tracking
lib/llm/src/grpc/service/openai.rs, lib/llm/src/grpc/service/tensor.rs
Captured model_name from request and added conditional rejection metric increments using request_was_rejected() check before returning errors. Updated CancellationLabels to use captured model name.
HTTP Service Rejection Tracking
lib/llm/src/http/service/openai.rs, lib/llm/src/http/service/anthropic.rs
Added rejection detection and metric increments across multiple endpoints (completions, embeddings, chat_completions, etc.). Refactored error mapping to check for rejections before constructing error responses. Updated test case to construct DynamoError with ErrorType::ResourceExhausted.
Error Construction
lib/llm/src/pipeline/network/egress/push_router.rs
Changed "all workers busy" error path to wrap PipelineError::ServiceOverloaded inside DynamoError with ErrorType::ResourceExhausted.
Test Infrastructure
tests/router/common.py
Added _parse_frontend_rejection_metric() and _verify_frontend_rejection_metrics() helpers to scrape and validate rejection metrics. Refactored _test_router_overload_503() with parameterized concurrent request counts, status aggregation, and frontend metrics verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Request Rejection Frontend metrics' clearly and concisely summarizes the main change: adding frontend metrics for request rejections.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description comprehensively covers overview, implementation details, review guidance, and references a related issue. All required template sections are present and well-populated.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/router/common.py (1)

601-605: Use the generated Prometheus name here.

This PR already regenerates lib/bindings/python/src/dynamo/prometheus_names.py, so hardcoding dynamo_frontend_model_rejection_total leaves this helper with a second source of truth for the same contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/router/common.py` around lines 601 - 605, Replace the hardcoded metric
name "dynamo_frontend_model_rejection_total" in tests/router/common.py with the
generated constant from the regenerated binding (import the module
lib.bindings.python.src.dynamo.prometheus_names as prometheus_names) and use
prometheus_names.DYNAMO_FRONTEND_MODEL_REJECTION_TOTAL (or the exact exported
constant for that metric) when matching lines in metrics_text so the test uses
the single source of truth for the Prometheus name; keep the existing checks
using model_name and endpoint intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 1949-1954: The mapped errors from engine.generate(...) in the
media handlers (e.g., the images path shown and the other occurrences in
videos() and video_stream()) increment model_rejection_total but do not notify
the InflightGuard, causing requests_total to record the default "internal"
error; fix by calling the inflight guard's mark_error(...) with the same mapped
ErrorMessage before returning the Err (i.e., ensure the mapping closure or the
error-path calls inflight.mark_error(&err) or inflight.mark_error(err.clone())
as appropriate, then return ErrorMessage::from_anyhow(...) so the inflight sees
the propagated 503/overload error), applying this pattern for the generate()
calls at the referenced sites.

In `@tests/router/common.py`:
- Around line 629-635: The except block that catches requests.RequestException
when calling requests.get(metrics_url) loses the original traceback by
re-raising an AssertionError without chaining; update the exception raise to
preserve chaining by raising the AssertionError from the caught exception (use
"from e") so the original RequestException and its traceback (from the
requests.get/metrics_response.raise_for_status path) are preserved for
debugging.
- Around line 722-724: Replace the blanket "except Exception" that logs and
returns (req_id, -1) so it only catches the specific transport-related
exceptions you expect (e.g., ConnectionError, TimeoutError, or your project's
transport-specific exception class) in the request-handling block where logger
and req_id are used; leave AssertionError and other unexpected exceptions
unhandled so they propagate and fail the test, and keep the logging of the
transport exception message as before.

---

Nitpick comments:
In `@tests/router/common.py`:
- Around line 601-605: Replace the hardcoded metric name
"dynamo_frontend_model_rejection_total" in tests/router/common.py with the
generated constant from the regenerated binding (import the module
lib.bindings.python.src.dynamo.prometheus_names as prometheus_names) and use
prometheus_names.DYNAMO_FRONTEND_MODEL_REJECTION_TOTAL (or the exact exported
constant for that metric) when matching lines in metrics_text so the test uses
the single source of truth for the Prometheus name; keep the existing checks
using model_name and endpoint intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a211183f-695c-477d-af49-40a18e7d08d2

📥 Commits

Reviewing files that changed from the base of the PR and between db14d63 and 1d9c9c3.

📒 Files selected for processing (11)
  • lib/bindings/python/src/dynamo/prometheus_names.py
  • lib/llm/src/grpc/service/openai.rs
  • lib/llm/src/grpc/service/tensor.rs
  • lib/llm/src/http/service/anthropic.rs
  • lib/llm/src/http/service/metrics.rs
  • lib/llm/src/http/service/openai.rs
  • lib/llm/src/migration.rs
  • lib/runtime/src/error.rs
  • lib/runtime/src/metrics/prometheus_names.rs
  • lib/runtime/src/pipeline/network/egress/push_router.rs
  • tests/router/common.py

Comment thread lib/llm/src/http/service/openai.rs
Comment thread tests/router/common.py Outdated
Comment thread tests/router/common.py Outdated
kthui added 2 commits March 30, 2026 14:18
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Comment thread lib/llm/src/http/service/metrics.rs Outdated
Comment thread lib/llm/src/grpc/service/openai.rs
kthui added 3 commits March 31, 2026 12:13
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
@kthui kthui requested a review from jh-nv March 31, 2026 20:07
@kthui kthui enabled auto-merge (squash) March 31, 2026 20:07
Comment thread tests/router/common.py Outdated
kthui added 2 commits April 6, 2026 18:44
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
@kthui kthui enabled auto-merge (squash) April 7, 2026 01:55
@kthui kthui merged commit 3205e7d into main Apr 7, 2026
93 checks passed
@kthui kthui deleted the jacky-ft-reject-metrics branch April 7, 2026 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants