Conversation
WalkthroughThis pull request introduces infrastructure and implementation for tracking model request rejections across gRPC and HTTP services. It adds a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/router/common.py (1)
601-605: Use the generated Prometheus name here.This PR already regenerates
lib/bindings/python/src/dynamo/prometheus_names.py, so hardcodingdynamo_frontend_model_rejection_totalleaves this helper with a second source of truth for the same contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/router/common.py` around lines 601 - 605, Replace the hardcoded metric name "dynamo_frontend_model_rejection_total" in tests/router/common.py with the generated constant from the regenerated binding (import the module lib.bindings.python.src.dynamo.prometheus_names as prometheus_names) and use prometheus_names.DYNAMO_FRONTEND_MODEL_REJECTION_TOTAL (or the exact exported constant for that metric) when matching lines in metrics_text so the test uses the single source of truth for the Prometheus name; keep the existing checks using model_name and endpoint intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 1949-1954: The mapped errors from engine.generate(...) in the
media handlers (e.g., the images path shown and the other occurrences in
videos() and video_stream()) increment model_rejection_total but do not notify
the InflightGuard, causing requests_total to record the default "internal"
error; fix by calling the inflight guard's mark_error(...) with the same mapped
ErrorMessage before returning the Err (i.e., ensure the mapping closure or the
error-path calls inflight.mark_error(&err) or inflight.mark_error(err.clone())
as appropriate, then return ErrorMessage::from_anyhow(...) so the inflight sees
the propagated 503/overload error), applying this pattern for the generate()
calls at the referenced sites.
In `@tests/router/common.py`:
- Around line 629-635: The except block that catches requests.RequestException
when calling requests.get(metrics_url) loses the original traceback by
re-raising an AssertionError without chaining; update the exception raise to
preserve chaining by raising the AssertionError from the caught exception (use
"from e") so the original RequestException and its traceback (from the
requests.get/metrics_response.raise_for_status path) are preserved for
debugging.
- Around line 722-724: Replace the blanket "except Exception" that logs and
returns (req_id, -1) so it only catches the specific transport-related
exceptions you expect (e.g., ConnectionError, TimeoutError, or your project's
transport-specific exception class) in the request-handling block where logger
and req_id are used; leave AssertionError and other unexpected exceptions
unhandled so they propagate and fail the test, and keep the logging of the
transport exception message as before.
---
Nitpick comments:
In `@tests/router/common.py`:
- Around line 601-605: Replace the hardcoded metric name
"dynamo_frontend_model_rejection_total" in tests/router/common.py with the
generated constant from the regenerated binding (import the module
lib.bindings.python.src.dynamo.prometheus_names as prometheus_names) and use
prometheus_names.DYNAMO_FRONTEND_MODEL_REJECTION_TOTAL (or the exact exported
constant for that metric) when matching lines in metrics_text so the test uses
the single source of truth for the Prometheus name; keep the existing checks
using model_name and endpoint intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a211183f-695c-477d-af49-40a18e7d08d2
📒 Files selected for processing (11)
lib/bindings/python/src/dynamo/prometheus_names.pylib/llm/src/grpc/service/openai.rslib/llm/src/grpc/service/tensor.rslib/llm/src/http/service/anthropic.rslib/llm/src/http/service/metrics.rslib/llm/src/http/service/openai.rslib/llm/src/migration.rslib/runtime/src/error.rslib/runtime/src/metrics/prometheus_names.rslib/runtime/src/pipeline/network/egress/push_router.rstests/router/common.py
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
4906a20 to
a17aa05
Compare
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
a17aa05 to
4462bea
Compare
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
Overview:
Add Request Rejection to Frontend metrics, and E2E tests.
Details:
Frontend Rejection Metrics:
DynamoErrorwith a new typeResourceExhaustedis returned.ResourceExhaustedtype.E2E test:
Where should the reviewer start?
lib/runtime/src/metrics/prometheus_names.rson the new metrics.lib/runtime/src/error.rson the newResourceExhaustederror type.lib/runtime/src/pipeline/network/egress/push_router.rsfor where the new error type is reported on upon rejection.lib/llm/src/migration.rswhere a rejected request is non-migratable.lib/llm/src/[grpc/http]/service/*where each endpoint looks for theResourceExhaustederror type in the error chain, and increment the rejection metrics if found.tests/router/common.pyexisting E2E test updated, and asserts the number of rejected requests matches the number on the metrics.tests/router/test_router_e2e_with_mockers.pywhere the test timeout is updated based on the new average test duration.lib/bindings/python/src/dynamo/prometheus_names.py.Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Closes DIS-757
Summary by CodeRabbit
Release Notes
New Features
Improvements