Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughUpdated deployment and cluster settings in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on optimizing the deployment configuration for the prostate classifier service within the Ray cluster. The changes aim to improve resource utilization, throughput, and flexibility by updating the model service source, fine-tuning Ray deployment parameters, increasing allocated resources, and adapting to NVIDIA MIG for GPU management. These adjustments are intended to enhance the classifier's performance and scalability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the configuration for the Ray service, primarily to optimize the prostate classifier deployment. The changes include adjusting resource allocations (CPU, memory), tuning request handling and batching parameters, and updating the worker configuration to use MIG-enabled GPUs. While these changes seem aimed at improving performance, I've found a few issues that need to be addressed: a critical bug that will cause the service to fail due to a missing configuration key, the use of a :latest image tag which is bad practice for production, and a working_dir URL pointing to a potentially temporary feature branch. Please see my detailed comments for suggestions on how to fix these issues.
There was a problem hiding this comment.
Pull request overview
This PR updates the RayService configuration for the prostate-classifier-1 Ray Serve application, aiming to improve throughput/latency characteristics and adjust GPU worker scheduling (including MIG usage) while also switching the runtime working_dir source.
Changes:
- Tune
BinaryClassifierdeployment parameters (request limits, autoscaling target, CPU/memory, batching settings). - Switch
prostate-classifier-1runtime_env.working_dirto a GitHub archive URL. - Update
gpu-workersto use a MIG GPU resource and addrayStartParams.num-gpus, and change the GPU worker image tag.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ray-service.yaml`:
- Line 12: The working_dir currently points to a feature branch zip URL (the
value of working_dir) which will break after the branch is deleted; update
working_dir to reference a stable source such as the main branch URL, a release
tag, or a commit SHA (e.g., replace the feature/main URL with the main branch
archive URL or a tagged-release/commit SHA URL) so deployments remain
reproducible and won't break after the PR merge.
- Line 222: The deployment is using an unpinned image tag ("image:
cerit.io/rationai/model-service:latest-gpu") which makes releases
non-reproducible; change this to a specific versioned tag (for example match the
cpu-workers pinned tag "2.53.0" or another explicit semver) so the ray-service
group uses a fixed image; update the image line to
"cerit.io/rationai/model-service:<version>-gpu" and ensure the chosen version is
consistent with the cpu-workers versioning strategy.
- Line 228: The current resource line nvidia.com/mig-2g.20gb: 1 assigns a 20GB
MIG partition but the container memory request/limit is set to 24Gi which
exceeds that partition; update the memory request and limit in the same pod spec
(where memory: 24Gi is declared) to a value ≤20Gi (e.g., 20Gi) or revert the GPU
resource to the full device by changing nvidia.com/mig-2g.20gb: 1 back to
nvidia.com/gpu: 1; additionally confirm cluster nodes use A100/H100 with MIG
mode enabled and the matching mig-2g.20gb partition configured before deploying.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
models/semantic_segmentation.py (1)
122-134: Consider consistency withbinary_classifier.pyfor LZ4 decompression.While not part of this PR's changes,
binary_classifier.pywrapslz4.decompressinasyncio.to_thread()(line 128) to avoid blocking the event loop, but this file calls it synchronously. For larger payloads, this could block the async event loop.Consider aligning with the other model for consistency:
- data = self.lz4.decompress(await request.body()) + data = await asyncio.to_thread(self.lz4.decompress, await request.body())This would require adding
import asyncioat the top of the file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@models/semantic_segmentation.py` around lines 122 - 134, The endpoint root currently calls self.lz4.decompress synchronously which can block the event loop for large payloads; change the call to run in a thread (use asyncio.to_thread) e.g. await asyncio.to_thread(self.lz4.decompress, await request.body()) and ensure asyncio is imported at the top; keep the rest of the flow (creating image from bytes, calling self.predict, and returning the compressed prediction) unchanged and reference the methods/attributes root, self.lz4.decompress, self.predict, and self.tile_size when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@models/semantic_segmentation.py`:
- Around line 122-134: The endpoint root currently calls self.lz4.decompress
synchronously which can block the event loop for large payloads; change the call
to run in a thread (use asyncio.to_thread) e.g. await
asyncio.to_thread(self.lz4.decompress, await request.body()) and ensure asyncio
is imported at the top; keep the rest of the flow (creating image from bytes,
calling self.predict, and returning the compressed prediction) unchanged and
reference the methods/attributes root, self.lz4.decompress, self.predict, and
self.tile_size when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f76a249-a486-4ffa-b655-a4c25645b8f4
📒 Files selected for processing (2)
models/binary_classifier.pymodels/semantic_segmentation.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@models/semantic_segmentation.py`:
- Line 124: The code wrongly awaits the synchronous lz4.frame.decompress call;
change the line to first await request.body() into a variable (e.g., raw = await
request.body()) and then call self.lz4.decompress(raw) without await, assigning
the result to data so data is the decompressed bytes; update references to data
accordingly (look for self.lz4.decompress and request.body() in the method).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 74295de0-9ed3-46ec-a859-13a0a0fc3b64
📒 Files selected for processing (1)
models/semantic_segmentation.py
2134a4a to
7bbd50c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ray-service.yaml (1)
12-12:⚠️ Potential issue | 🟠 MajorPin
working_dirto the revision that actually contains this change set.These URLs now fetch
refs/heads/main.zip, while this PR also changesmodels/binary_classifier.pyandmodels/semantic_segmentation.py. A rollout from this branch would therefore boot replicas from currentmain, not the code under review, and later restarts/autoscaling will keep pulling a moving target. Please switch these to an immutable commit or release artifact that includes this PR.Also applies to: 44-44, 78-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ray-service.yaml` at line 12, The working_dir in the Ray service manifest is pointing to a moving ref (https://github.com/RationAI/model-service/archive/refs/heads/main.zip) which will not include the PR's changes; update the working_dir entries (the working_dir YAML key) to point to an immutable artifact that contains this branch—e.g., a commit tar/zip URL or a release/tag that includes the changes to models/binary_classifier.py and models/semantic_segmentation.py—so replicas and future restarts always fetch the exact code for this PR.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@models/binary_classifier.py`:
- Line 20: The TypedDict Config is missing the TensorRT key
trt_max_workspace_size used by reconfigure(), causing the type contract to be
out of sync; update the Config TypedDict to either add trt_max_workspace_size:
int (and keep trt_builder_optimization_level: int) or mark both TensorRT entries
(trt_builder_optimization_level and trt_max_workspace_size) as NotRequired[int]
to reflect that reconfigure() uses .get() fallbacks—modify the Config definition
accordingly and keep reconfigure()’s use of config.get(...) unchanged.
---
Duplicate comments:
In `@ray-service.yaml`:
- Line 12: The working_dir in the Ray service manifest is pointing to a moving
ref (https://github.com/RationAI/model-service/archive/refs/heads/main.zip)
which will not include the PR's changes; update the working_dir entries (the
working_dir YAML key) to point to an immutable artifact that contains this
branch—e.g., a commit tar/zip URL or a release/tag that includes the changes to
models/binary_classifier.py and models/semantic_segmentation.py—so replicas and
future restarts always fetch the exact code for this PR.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1cf0e22f-2b83-4ea5-ab92-e196a1400001
📒 Files selected for processing (3)
models/binary_classifier.pymodels/semantic_segmentation.pyray-service.yaml
| @@ -61,8 +62,10 @@ spec: | |||
| mpp: 0.468 | |||
| max_batch_size: 8 | |||
There was a problem hiding this comment.
This seems wrong w.r.t. to other settings
| - name: HeatmapBuilder | ||
| max_ongoing_requests: 16 | ||
| max_queued_requests: 64 | ||
| max_ongoing_requests: 24 |
There was a problem hiding this comment.
I doubt that 12 Gb RAM is enough for 24 ongoing requests. In each request you are asynchronously loading multiple 1024x1024 tiles
| import lz4.frame | ||
|
|
||
| self.lz4 = lz4.frame | ||
| self.tile_size = 1024 # default, will be overridden by reconfigure |
There was a problem hiding this comment.
There shouldn't be default for this
| self.tile_size = 1024 # default, will be overridden by reconfigure |
| @serve.deployment(num_replicas="auto") | ||
| @serve.ingress(fastapi) | ||
| class SemanticSegmentation: | ||
| """Semantic segmentation for tissue tiles using ONNX Runtime with GPU and TensorRT support.""" |
| async def predict( | ||
| self, images: list[NDArray[np.uint8]] | ||
| ) -> list[NDArray[np.float16]]: | ||
| """Run inference on a batch of images.""" |
There was a problem hiding this comment.
AI slop
| """Run inference on a batch of images.""" |
| import lz4.frame | ||
|
|
||
| self.lz4 = lz4.frame | ||
| self.tile_size = 512 # default, will be overridden by reconfigure |
There was a problem hiding this comment.
| self.tile_size = 512 # default, will be overridden by reconfigure |
| @@ -69,8 +72,10 @@ def reconfigure(self, config: Config) -> None: | |||
| "trt_engine_cache_path": cache_path, | |||
| "trt_max_workspace_size": config.get( | |||
| "trt_max_workspace_size", 8 * 1024 * 1024 * 1024 | |||
There was a problem hiding this comment.
| "trt_max_workspace_size", 8 * 1024 * 1024 * 1024 | |
| "trt_max_workspace_size", 8 * 1024**3 |
| # Configure ONNX Runtime session | ||
| sess_options = ort.SessionOptions() | ||
| sess_options.intra_op_num_threads = config["intra_op_num_threads"] | ||
| sess_options.inter_op_num_threads = 1 |
Summary by CodeRabbit
Chores
New Features