Skip to content

feat(ai-platform): SAIA service exposure to external requests#86

Merged
kupratyu-splunk merged 7 commits into
mainfrom
saia-gateway-changes
Apr 29, 2026
Merged

feat(ai-platform): SAIA service exposure to external requests#86
kupratyu-splunk merged 7 commits into
mainfrom
saia-gateway-changes

Conversation

@kbhos-splunk
Copy link
Copy Markdown
Collaborator

Description

Adds a single, environment-agnostic contract for exposing the SAIA
public Service to external clients. The operator already renders a
public Service named <aiPlatform.name>-saia-service whose endpoints
are the in-cluster nginx pods (which terminate path-based v1/v2
routing). This change wires up the install scripts and cluster-config
template so the same Service can be reached three different ways
without any operator code change:

Related Issues

  • Related to #

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test improvement
  • CI/CD improvement
  • Chore (dependency updates, etc.)

Changes Made

Testing Performed

  • Unit tests pass (make test)
  • Linting passes (make lint)
  • Integration tests pass (if applicable)
  • E2E tests pass (if applicable)
  • Manual testing performed

Test Environment

  • Kubernetes Version:
  • Cloud Provider:
  • Deployment Method:

Test Steps

Documentation

  • Updated inline code comments
  • Updated README.md (if adding features)
  • Updated API documentation
  • Updated deployment guides
  • Updated CHANGELOG.md
  • No documentation needed

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have updated the Helm chart version (if applicable)
  • I have updated CRD schemas (if applicable)

Breaking Changes

Impact:

Migration Path:

Screenshots/Recordings

Additional Notes

Reviewer Notes

Please pay special attention to:


Commit Message Convention: This PR follows Conventional Commits

@kbhos-splunk kbhos-splunk changed the base branch from main to ai-tier-v2-k0s April 27, 2026 04:07
- cluster-config.yaml: rewrite SAIA exposure as 3 NodePort-free modes,
  drop redundant nginx image entry; add byoTargetGroup config block
- eks_cluster_with_stack.sh: read BYO_TG_*; add validate_byo_target_group_config,
  apply_byo_target_group_binding, patch_saia_service_disable_nodeport;
  update patch_saia_public_service_workaround for NodePort-free mode
- k0s-cluster-config.yaml: switch SAIA exposure to LoadBalancer + MetalLB;
  add metallb config block; revert object storage to type=minio with AWS S3
  endpoint (the only working path on k0s — type=aws is silently swapped to
  in-cluster MinIO by the install script)
- k0s_cluster_with_stack.sh: add install_metallb function (chart pin 0.14.8,
  L2 / BGP advertisements); patch_k0s_saia_service_disable_nodeport; fix
  describe_pod node-count whitespace bug
- artifacts.yaml: minor diff (will be overwritten by upcoming merge)

Pre-merge of origin/ai-tier-v2-k0s; will be subsumed by the merge commit.

Made-with: Cursor
Brings in:
- 4146385 refactor: remove in-cluster MinIO install (k0s installer now
  requires customer-managed object storage; aligns with the bug we hit
  where type=aws was silently swapped to in-cluster MinIO)
- 86cf822 fix: removal of aws specific usages
- 880f68b feature: including saia deployments helm configs
- 3d1104d feat: added initContainer for saia-vector-db-setup posthook
- 922cb4f fix: add safety gate to prevent install_k0s_cluster from wiping a live cluster
- 0ccde9f refactor: remove ecr credential refresher
- d74d9c5 fix: github copilot review comments
- plus go.mod/go.sum CVE patches and minor cleanups

Conflict resolutions:
- tools/cluster_setup/artifacts.yaml: take theirs (generated CRD bundle from new operator)
- tools/cluster_setup/cluster-config.yaml:
    block 1: keep our LIFECYCLE WORKFLOW comment, take theirs' placeholder
             values (useExisting: false, name: my-ai-cluster)
    block 2: keep ours (objectStore.type=aws — EKS supports AWS S3 end-to-end)
- tools/cluster_setup/k0s-cluster-config.yaml:
    sshKeyPath: take theirs (~/.ssh/id_rsa placeholder)
    existingIPs: take theirs (10.0.0.x placeholder IPs)
    storage header: take theirs (new preflight checks comment)
    objectStore: keep ours (type=minio + AWS S3 endpoint — the only working
                 path on k0s, per the post-merge testing)
    operator image: take theirs (splunk-ai-operator:latest placeholder)
    files paths: take theirs (with CHANGE THIS comments)
- tools/cluster_setup/k0s_cluster_with_stack.sh: auto-merged cleanly
    (their MinIO-removal refactor + our install_metallb / disable-NodePort
    function landed in different sections, no manual fixup needed)

Validation post-merge:
- bash -n passes on both eks/k0s install scripts
- yq parses all three cluster configs
- zero conflict markers anywhere
- key sections preserved on both sides:
    EKS: objectStore.type=aws, serviceTemplate.type=LoadBalancer,
         byoTargetGroup.enabled=false (new field)
    k0s: objectStore.type=minio (CRD workaround), metallb.install=true
         (our MetalLB block), install_metallb() function intact

Made-with: Cursor
Base automatically changed from ai-tier-v2-k0s to main April 29, 2026 13:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an environment-agnostic contract for exposing the SAIA public Service to external clients, wiring installer scripts + config templates to support NodePort-free external access (MetalLB on k0s; AWS Load Balancer Controller / TargetGroupBinding on EKS) without requiring operator changes.

Changes:

  • Add guardrails against applying placeholder object-store credentials and improve preflight validation.
  • Implement NodePort-free SAIA exposure flows: MetalLB install/patching for k0s; AWS LBC install + optional BYO target group binding for EKS.
  • Update cluster config templates and installer robustness tweaks (macOS sed handling, EIP/NAT guidance, health check behavior).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tools/cluster_setup/k0s_cluster_with_stack.sh Adds placeholder credential detection, MetalLB install + SAIA Service patching flow, and adjusts health check logic.
tools/cluster_setup/k0s-cluster-config.yaml Updates k0s template to default to LoadBalancer exposure + adds MetalLB configuration + placeholder creds.
tools/cluster_setup/eks_cluster_with_stack.sh Adds SAIA serviceTemplate handling, AWS Load Balancer Controller install/IRSA + BYO TargetGroupBinding mode, and image wiring updates.
tools/cluster_setup/cluster-config.yaml Updates EKS config template to document/configure NodePort-free SAIA exposure modes and related AWS settings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/cluster_setup/cluster-config.yaml Outdated
Comment thread tools/cluster_setup/k0s_cluster_with_stack.sh Outdated
kbhos-splunk and others added 2 commits April 30, 2026 02:26
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@kupratyu-splunk kupratyu-splunk merged commit 1153ca3 into main Apr 29, 2026
10 checks passed
@kupratyu-splunk kupratyu-splunk deleted the saia-gateway-changes branch April 29, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants