Skip to content

fix: Inject S3 credentials for SAIA v1 and Ray on AWS s3 storage#92

Merged
kupratyu-splunk merged 3 commits into
mainfrom
s3-storage
May 21, 2026
Merged

fix: Inject S3 credentials for SAIA v1 and Ray on AWS s3 storage#92
kupratyu-splunk merged 3 commits into
mainfrom
s3-storage

Conversation

@kbhos-splunk
Copy link
Copy Markdown
Collaborator

k0s and other non-EKS clusters use static keys in minio-credentials rather than
IRSA/instance profiles. SAIA v1 calls boto3 directly but only received
S3COMPAT_* env vars, causing NoCredentialsError at startup while v2 worked.
Ray Serve replicas could misclassify regional AWS S3 URLs as s3compat and omit
AWS_* credentials, breaking model/artifact access.
Changes:

  • SAIA: add appendSAIABoto3Env() in buildSAIABaseEnv (AWS_ACCESS_KEY_ID,
    AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_ENDPOINT_URL when configured);
    reconcile v1 deployment via buildSAIABaseEnv instead of duplicating env.
  • Ray builder: classify s3:// + *.amazonaws.com as aws; inject credentials
    when secretRef is set; add object_storage_test.go.
  • applications.yaml: template AWS_* vars for all Serve runtime env blocks.
  • k0s_cluster_with_stack.sh: type=aws omits endpoint on AIPlatform CR, uses
    cluster region, rejects STS keys, type-specific preflight for endpoints.
    Tested on k0s (ap-southeast-2 S3): SAIA v1/v2 Running, Ray workers healthy
    after IAM policy + operator image with these changes.

Description

Related Issues

  • Related to #

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test improvement
  • CI/CD improvement
  • Chore (dependency updates, etc.)

Changes Made

Testing Performed

  • Unit tests pass (make test)
  • Linting passes (make lint)
  • Integration tests pass (if applicable)
  • E2E tests pass (if applicable)
  • Manual testing performed

Test Environment

  • Kubernetes Version:
  • Cloud Provider:
  • Deployment Method:

Test Steps

Documentation

  • Updated inline code comments
  • Updated README.md (if adding features)
  • Updated API documentation
  • Updated deployment guides
  • Updated CHANGELOG.md
  • No documentation needed

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have updated the Helm chart version (if applicable)
  • I have updated CRD schemas (if applicable)

Breaking Changes

Impact:

Migration Path:

Screenshots/Recordings

Additional Notes

Reviewer Notes

Please pay special attention to:


Commit Message Convention: This PR follows Conventional Commits

… stores

k0s and other non-EKS clusters use static keys in minio-credentials rather than
IRSA/instance profiles. SAIA v1 calls boto3 directly but only received
S3COMPAT_* env vars, causing NoCredentialsError at startup while v2 worked.

Ray Serve replicas could misclassify regional AWS S3 URLs as s3compat and omit
AWS_* credentials, breaking model/artifact access.

Changes:
- SAIA: add appendSAIABoto3Env() in buildSAIABaseEnv (AWS_ACCESS_KEY_ID,
  AWS_SECRET_ACCESS_KEY, AWS_REGION, AWS_ENDPOINT_URL when configured);
  reconcile v1 deployment via buildSAIABaseEnv instead of duplicating env.
- Ray builder: classify s3:// + *.amazonaws.com as aws; inject credentials
  when secretRef is set; add object_storage_test.go.
- applications.yaml: template AWS_* vars for all Serve runtime env blocks.
- k0s_cluster_with_stack.sh: type=aws omits endpoint on AIPlatform CR, uses
  cluster region, rejects STS keys, type-specific preflight for endpoints.

Tested on k0s (ap-southeast-2 S3): SAIA v1/v2 Running, Ray workers healthy
after IAM policy + operator image with these changes.
@kbhos-splunk kbhos-splunk changed the title fix: inject static S3 credentials for SAIA v1 and Ray on AWS s3 storage fix: Inject S3 credentials for SAIA v1 and Ray on AWS s3 storage May 21, 2026
@kbhos-splunk kbhos-splunk requested review from kupratyu-splunk and spl-arif and removed request for kupratyu-splunk May 21, 2026 12:29
@kupratyu-splunk kupratyu-splunk requested a review from Copilot May 21, 2026 13:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes AWS S3 credential injection and object-storage classification so SAIA v1 and Ray Serve replicas can authenticate correctly on non-EKS clusters (e.g., k0s) that rely on static keys in minio-credentials, and avoids misclassifying AWS regional S3 endpoints as s3compat.

Changes:

  • SAIA: centralize and inject boto3-standard AWS_* env vars via buildSAIABaseEnv() for both v1 and v2; update SAIA v1 deployment reconcile to use the shared base env.
  • Ray builder: introduce object-storage classification helpers (including AWS regional endpoint detection), inject credentials for AWS as well as s3compat, and add unit tests.
  • Installer/templates: update k0s installer object-storage preflights/CR rendering (type-specific endpoint behavior, region propagation, STS-key rejection) and template AWS_* variables into all Serve runtime_env blocks.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/cluster_setup/k0s_cluster_with_stack.sh Adds YAML parse validation, improves object-store preflights, adjusts endpoint/region handling for type=aws, and refines node disk preflight SSH logic.
pkg/ai/raybuilder/builder.go Adds classifyObjectStorage() / isAWSRegionalEndpoint() helpers, injects credentials for AWS paths when secretRef is set, and passes region through to app templating.
pkg/ai/raybuilder/object_storage_test.go Adds unit tests for object-storage classification and AWS endpoint detection.
pkg/ai/features/saia/impl.go Adds shared boto3 AWS_* env injection for SAIA v1/v2 and removes duplicated v2-only AWS env logic.
pkg/ai/features/saia/impl_test.go Updates/relocates tests to validate AWS env injection via buildSAIABaseEnv().
config/configs/applications.yaml Templates boto3-standard AWS_* env vars into all Ray Serve runtime env blocks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/cluster_setup/k0s_cluster_with_stack.sh Outdated
Comment thread tools/cluster_setup/k0s_cluster_with_stack.sh Outdated
Comment thread pkg/ai/raybuilder/builder.go Outdated
Comment thread pkg/ai/raybuilder/builder.go Outdated
kbhos-splunk and others added 2 commits May 21, 2026 19:29
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@kupratyu-splunk kupratyu-splunk merged commit f9a3c06 into main May 21, 2026
10 checks passed
@kupratyu-splunk kupratyu-splunk deleted the s3-storage branch May 21, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants