Skip to content

Add CPU and GPU deployment modes#88

Merged
wilke0818 merged 2 commits intomainfrom
codex/gpu-or-cpu-main
Apr 15, 2026
Merged

Add CPU and GPU deployment modes#88
wilke0818 merged 2 commits intomainfrom
codex/gpu-or-cpu-main

Conversation

@satra
Copy link
Copy Markdown
Collaborator

@satra satra commented Mar 11, 2026

Summary

  • add explicit CPU vs GPU deployment controls for the server build and runtime
  • document the GPU-sensitive functions and deployment workflow
  • keep CPU deployments working when ONNX pose export dependencies are missing

Testing

  • python3 -m unittest discover -s tests

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the deployment flexibility of the Riverst server by introducing explicit support for both CPU and GPU environments. It provides clear mechanisms for configuring the build and runtime for each target, ensuring that users can leverage hardware acceleration when available while maintaining full functionality on CPU-only systems. The changes include updated documentation, Docker integration, and a robust fallback strategy for GPU-dependent components, making the system more adaptable to various deployment scenarios.

Highlights

  • Flexible Deployment Modes: Introduced explicit controls for CPU and GPU deployment targets, allowing the server to be built and run optimally for either environment.
  • Enhanced Documentation: Added comprehensive documentation detailing GPU-sensitive functions, the deployment workflow, and configuration instructions for both CPU and GPU setups across README.md, src/server/README.md, notes/first_steps_to_deploy.md, and a new docs/gpu-cpu-deployment-plan.md.
  • Robust CPU Fallback: Ensured that CPU deployments remain fully functional even when ONNX pose export dependencies are unavailable, with the system gracefully falling back to PyTorch YOLO models.
  • Docker Integration: Implemented Docker Compose overrides (docker-compose.gpu.yaml) and Dockerfile build arguments (RIVERST_DEPLOYMENT_TARGET) to streamline GPU-accelerated container builds and runtime.
  • Centralized Device Management: Refactored device_utils.py to centralize runtime device selection using the RIVERST_COMPUTE_DEVICE environment variable, supporting 'auto' (prefer accelerators) and 'cpu' (force CPU) policies.
  • New GPU Dependency Management: Created a dedicated requirements.gpu.txt file to manage GPU-specific Python dependencies, separating them from standard CPU requirements.
  • Device Utility Testing: Added a new test suite (test_device_utils.py) to validate the logic for compute device policy and selection.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • README.md
    • Added instructions for GPU-enabled Docker Compose setup.
    • Included guidance for manual GPU-oriented Python package installation.
    • Documented the RIVERST_COMPUTE_DEVICE environment variable for forcing CPU inference.
  • docker-compose.gpu.yaml
    • Added a new Docker Compose override file for GPU deployments, configuring build arguments and GPU access.
  • docker-compose.yaml
    • Modified the server build section to accept RIVERST_DEPLOYMENT_TARGET as a build argument.
    • Added RIVERST_COMPUTE_DEVICE environment variable to the server service configuration.
  • docs/gpu-cpu-deployment-plan.md
    • Added a new document outlining the GPU and non-GPU deployment plan, including goals, findings on GPU-sensitive functions, decisions, and implementation notes.
  • notes/first_steps_to_deploy.md
    • Updated EC2 instance type recommendations to differentiate between GPU and non-GPU deployments.
    • Clarified that NVIDIA driver installation is only required for GPU deployments.
    • Added a note to configure RIVERST_COMPUTE_DEVICE=cpu for CPU-only deployments in the .env file.
  • src/server/Dockerfile
    • Added RIVERST_DEPLOYMENT_TARGET as a build argument with a default of 'cpu'.
    • Modified the dependency installation step to conditionally install requirements.txt or requirements.gpu.txt based on RIVERST_DEPLOYMENT_TARGET.
  • src/server/README.md
    • Updated virtual environment setup instructions to use python -m venv.
    • Added instructions for installing GPU-oriented Python dependencies.
    • Included guidance on setting RIVERST_COMPUTE_DEVICE=cpu to disable GPU/MPS usage.
    • Added Docker commands for building and running GPU-oriented images.
  • src/server/bot/processors/video/processor.py
    • Refactored YOLO pose model initialization into a new _load_pose_inferencer method.
    • Implemented a fallback mechanism to use the PyTorch YOLO model if ONNX export fails or is unavailable.
  • src/server/bot/utils/device_utils.py
    • Introduced COMPUTE_DEVICE_ENV_VAR and DEPLOYMENT_TARGET_ENV_VAR constants.
    • Added get_compute_device_policy and get_deployment_target functions to retrieve environment variables.
    • Rewrote get_best_device to respect the RIVERST_COMPUTE_DEVICE policy, allowing explicit CPU-only operation or automatic detection.
  • src/server/env.example
    • Added RIVERST_COMPUTE_DEVICE environment variable with a default value of 'auto' and a description.
  • src/server/requirements.gpu.txt
    • Added a new file listing GPU-specific Python dependencies, including onnx, onnxslim, and onnxruntime-gpu.
  • src/server/tests/test_device_utils.py
    • Added a new test file containing unit tests for the device_utils module, covering default policy, CPU policy enforcement, and invalid policy handling.
Activity
  • The author has indicated that the changes were tested using python3 -m unittest discover -s tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-structured mechanism for selecting between CPU and GPU deployments, both at build time (via Docker build args and different requirements files) and at runtime (via an environment variable). The changes are consistently applied across Docker configurations, documentation, and application code. The fallback for ONNX-dependent features on CPU builds is a nice touch for robustness. The addition of a new test suite for the device selection logic is also a great improvement. I have one suggestion to improve the reliability of the new tests.

Comment thread src/server/tests/test_device_utils.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@wilke0818 wilke0818 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable

@wilke0818 wilke0818 merged commit 69a9279 into main Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants