Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f17b874
changes for supporting minio in operator and script
kupratyu-splunk Feb 20, 2026
17252f4
generi object storage changes
kupratyu-splunk Feb 25, 2026
c9b8de3
changes for s3 compatable storage in operator
kupratyu-splunk Feb 25, 2026
f529ea8
vulnerability issue: version upgrade for opentelemetry-go from v1.33.…
kupratyu-splunk Feb 26, 2026
718a31c
s3object storage changes
kupratyu-splunk Feb 26, 2026
6bb3a60
fix: bump splunk-operator helm dependency from 3.0.0 to 3.1.0
kupratyu-splunk Mar 20, 2026
fd6727c
fix: update Ray serve import paths to remove splunkai_models_apps prefix
kupratyu-splunk Mar 21, 2026
8d4f721
fix: add working_dir to Ray serve apps and wire WorkingDirBase/ModelV…
kupratyu-splunk Mar 21, 2026
be67105
fix: path-style addressing for MinIO and rename object_storage to blo…
kupratyu-splunk Mar 22, 2026
b9e10d9
fix: use MinIO HTTP endpoint for working_dir instead of broken s3:// …
kupratyu-splunk Mar 22, 2026
858c148
fix: bundle app code into image via file:// working_dir instead of Mi…
kupratyu-splunk Mar 22, 2026
12a8e29
fix: use file:// working_dir for bundled prompt injection models, rem…
kupratyu-splunk Mar 22, 2026
969a737
fix: SAIA resource defaults and preserve AIService resources on recon…
kupratyu-splunk Mar 22, 2026
2da6120
fix: point file:// working_dir to .zip file not directory
kupratyu-splunk Mar 23, 2026
4477cb9
fix: use file:// working_dir for all apps; use s3:// for minio workin…
kupratyu-splunk Mar 23, 2026
11798a1
fix: use entrypoint.zip for Entrypoint app working_dir
kupratyu-splunk Mar 24, 2026
f0b9785
fix: rename blob_storage prefix to blob_prefix to match SDK field name
kupratyu-splunk Mar 24, 2026
2d44244
fix: remove task: classify from engine_args (not supported in vllm 0.…
kupratyu-splunk Mar 24, 2026
977934e
fix: remove model_definition from MbartTranslator app config
kupratyu-splunk Mar 25, 2026
2d7c3b2
feat: replace llama models with gpt-oss-20b and gpt-oss-120b
kupratyu-splunk Mar 25, 2026
0b83c1e
fix: move VLLM_ATTENTION_BACKEND to top-level runtime_env to prevent …
kupratyu-splunk Mar 25, 2026
a1a13fd
fix: reduce GptOss120b to 1 GPU / tensor_parallel_size 1 (quantized m…
kupratyu-splunk Mar 25, 2026
d056a69
fix: set IdleTimeoutSeconds=600 on worker groups to prevent autoscale…
kupratyu-splunk Mar 25, 2026
b122df7
fix: use AutoscalerOptions.IdleTimeoutSeconds instead of WorkerGroupS…
kupratyu-splunk Mar 26, 2026
9945411
fix: increase l40s-1-gpu ephemeral storage to 200Gi and memory to 64Gi
kupratyu-splunk Mar 26, 2026
c856da2
fix: use 2 GPUs and tensor_parallel_size=2 for gpt-oss-120b
kupratyu-splunk Mar 27, 2026
03ef451
fix: increase l40s-2-gpu memory to 128Gi and ephemeral-storage to 200…
kupratyu-splunk Mar 27, 2026
61bec1e
Revert "fix: increase l40s-2-gpu memory to 128Gi and ephemeral-storag…
kupratyu-splunk Mar 27, 2026
e9bb76a
feat: add H100 support with configurable gpu_types via defaultAcceler…
kupratyu-splunk Mar 31, 2026
ab17c85
feat: add H100/L40S cluster setup support in eks and k0s scripts
kupratyu-splunk Mar 31, 2026
802e52f
fix: upgrade grpc and cert-manager to patch CVE-2026-33186 and CVE-20…
kupratyu-splunk Mar 31, 2026
8f56527
all k0s changes + fixes
spl-arif Apr 15, 2026
e5ee1eb
make k0s script run fast + revisited model configs for all models
spl-arif Apr 15, 2026
f3a75d5
feat(saia): add SAIA v2 deployment + nginx path-based v1/v2 router
spl-arif Apr 20, 2026
6c15036
feat: add configurable aiPlatformScheme to AIServiceSpec
kupratyu-splunk Apr 20, 2026
6870cb8
feat(saia): expose public SAIA service via NodePort for Pattern-B v2 …
spl-arif Apr 20, 2026
7db8e01
fix(saia): wire FIELD_DESCRIPTION S3 backend on v2 API and v2 worker
spl-arif Apr 20, 2026
cb76b29
fix(saia): wire AWS_ACCESS_KEY_ID/SECRET on v2 pods for S3FieldDescri…
spl-arif Apr 20, 2026
4656c4c
fix(saia): set v2 worker RUN_TASKS_DELAY_S=10 to keep heartbeat fresh
spl-arif Apr 20, 2026
824af70
feat: update images
spl-arif Apr 20, 2026
8fa59a5
fix(saia): unblock airgap v2 query path via CORS preflight, authz re-…
spl-arif Apr 21, 2026
e51baad
fix: WEAVIATE_PLATFORM_URL + support for rhel 10 (untested)
spl-arif Apr 24, 2026
1ee8a6b
fix: reverted support for rhel 10 (untested)
spl-arif Apr 24, 2026
9cf5cc2
fix: vulnerability issues CVE-2026-29181 and CVE-2026-39883
spl-arif Apr 27, 2026
7a24a4c
fix: downgrade go version for fixing unit cases
spl-arif Apr 27, 2026
cc874b4
fix: upgrade go version due to vuln issue
spl-arif Apr 28, 2026
880f68b
feature: including saia deployments helm configs
spl-arif Apr 28, 2026
86cf822
fix: removal of aws specific usages
spl-arif Apr 28, 2026
4146385
refactor: replace NVMe auto-format with preflight storage checks, rem…
spl-arif Apr 28, 2026
0ccde9f
refactor: remove ecr credential refresher
spl-arif Apr 28, 2026
922cb4f
fix: add safety gate to prevent install_k0s_cluster from wiping a liv…
spl-arif Apr 28, 2026
b137271
fix: added logging to a file
spl-arif Apr 28, 2026
3d1104d
feat: added initContainer for saia-vector-db-setup posthook
spl-arif Apr 28, 2026
d74d9c5
fix: github copilot review comments
spl-arif Apr 29, 2026
5797328
fix: code review comments
spl-arif Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
OPERATOR_SDK_VERSION=v1.31.0
REVIEWERS=vivekr-splunk,rlieberman-splunk,patrykw-splunk,Igor-splunk,kasiakoziol
GO_VERSION=1.24.0
GO_VERSION=1.25.0
AWSCLI_URL=https://awscli.amazonaws.com/awscli-exe-linux-x86_64-2.8.6.zip
KUBECTL_VERSION=v1.29.1
AZ_CLI_VERSION=2.30.0
Expand Down
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ bin
testbin/*
examplecodebase/*
Dockerfile.cross
tmp/*

# Test binary, build with `go test -c`
*.test
Expand All @@ -30,7 +31,17 @@ Dockerfile.cross
skaffold.env.local
.skaffold/

# Logs
tools/cluster_setup/logs/

# Helm build artifacts
*.tgz
helm-chart/**/charts/
!helm-chart/**/charts/.gitkeep

# Cluster-setup script byproducts (*.original): pristine-snapshot backups
# written by tools/cluster_setup/k0s_cluster_with_stack.sh on first run and
# reused as a reset point on subsequent runs (see configure_images()
# → "Restoring from clean originals"). Needed locally for idempotent
# re-installs; never committed.
tools/cluster_setup/*.original
9 changes: 7 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Build the manager binary
FROM docker.io/golang:1.24 AS builder
ARG GO_VERSION=1.25.0
FROM docker.io/golang:${GO_VERSION} AS builder
ARG TARGETOS
ARG TARGETARCH

Expand Down Expand Up @@ -43,7 +44,11 @@ COPY LICENSE LICENSE-2.0.txt
COPY --from=builder /certs/tls.crt /certs/tls.crt
COPY --from=builder /certs/tls.key /certs/tls.key

USER 65532:65532
# Run as non-root UID with GID 0 (root group). GID 0 is required on
# RHEL / OpenShift / k0s nodes: the container runtime assigns a random
# UID at launch and only grants group-read/write to GID 0. Without it
# the process cannot read /manager or the config files copied above.
USER 1001:0
ENV INSTANCE_FILE=/instance.yaml
ENV APPLICATION_FILE=/applications.yaml
ENTRYPOINT ["/manager"]
3 changes: 2 additions & 1 deletion Dockerfile.debug
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Build the manager binary with debug symbols
FROM docker.io/golang:1.24 AS builder
ARG GO_VERSION=1.25.0
FROM docker.io/golang:${GO_VERSION} AS builder
ARG TARGETOS
ARG TARGETARCH

Expand Down
19 changes: 19 additions & 0 deletions Dockerfile.k0s-runner
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM registry.access.redhat.com/ubi9/ubi:latest

RUN dnf install -y --allowerasing openssh-clients git jq && dnf clean all

ARG TARGETARCH

# kubectl
RUN curl -fsSL "https://dl.k8s.io/release/$(curl -fsSL https://dl.k8s.io/release/stable.txt)/bin/linux/${TARGETARCH}/kubectl" \
-o /usr/local/bin/kubectl && chmod +x /usr/local/bin/kubectl

# helm
RUN curl -fsSL "https://get.helm.sh/helm-v3.17.1-linux-${TARGETARCH}.tar.gz" | tar xz -C /tmp \
&& mv /tmp/linux-${TARGETARCH}/helm /usr/local/bin/helm && rm -rf /tmp/linux-${TARGETARCH}

# yq
RUN curl -fsSL "https://github.com/mikefarah/yq/releases/latest/download/yq_linux_${TARGETARCH}" \
-o /usr/local/bin/yq && chmod +x /usr/local/bin/yq

WORKDIR /workspace
11 changes: 9 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ endif
# tools. (i.e. podman)
CONTAINER_TOOL ?= docker

# GO_VERSION is read from .env if not already set, and passed as a build-arg to docker builds.
GO_VERSION ?= $(shell grep '^GO_VERSION=' .env | cut -d= -f2)

# Setting SHELL to bash allows bash commands to be executed by recipes.
# Options are set to exit when a recipe line exits non-zero or a piped command fails.
SHELL = /usr/bin/env bash -o pipefail
Expand Down Expand Up @@ -215,7 +218,11 @@ run: manifests generate fmt vet ## Run a controller from your host.
# More info: https://docs.docker.com/develop/develop-images/build_enhancements/
.PHONY: docker-build
docker-build: ## Build docker image with the manager.
$(CONTAINER_TOOL) build -t ${IMG} .
$(CONTAINER_TOOL) build --build-arg GO_VERSION=$(GO_VERSION) -t ${IMG} .

.PHONY: docker-build-amd64
docker-build-amd64: ## Build docker image for linux/amd64 (e.g. for x86_64 servers/EC2).
$(CONTAINER_TOOL) build --platform=linux/amd64 --build-arg GO_VERSION=$(GO_VERSION) -t ${IMG} .

.PHONY: docker-push
docker-push: ## Push docker image with the manager.
Expand All @@ -234,7 +241,7 @@ docker-buildx: ## Build and push docker image for the manager for cross-platform
sed -e '1 s/\(^FROM\)/FROM --platform=\$$\{BUILDPLATFORM\}/; t' -e ' 1,// s//FROM --platform=\$$\{BUILDPLATFORM\}/' Dockerfile > Dockerfile.cross
- $(CONTAINER_TOOL) buildx create --name splunk-ai-operator-builder
$(CONTAINER_TOOL) buildx use splunk-ai-operator-builder
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --tag ${IMG} -f Dockerfile.cross .
- $(CONTAINER_TOOL) buildx build --push --platform=$(PLATFORMS) --build-arg GO_VERSION=$(GO_VERSION) --tag ${IMG} -f Dockerfile.cross .
- $(CONTAINER_TOOL) buildx rm splunk-ai-operator-builder
rm Dockerfile.cross

Expand Down
16 changes: 11 additions & 5 deletions api/v1/aiplatform_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -364,13 +364,13 @@ type SidecarSpec struct {
// ObjectStorageSpec defines object storage configuration for AI artifacts, tasks, and models
type ObjectStorageSpec struct {
// Remote volume URI in the format s3://bucketname/<path prefix>, gs://bucketname/<path prefix>,
// azure://containername/<path prefix>, or minio://bucketname/<path prefix>
// azure://containername/<path prefix>, s3compat://bucketname/<path prefix> (generic S3-compatible), minio://, or seaweedfs://
// +kubebuilder:validation:Required
// +kubebuilder:validation:Pattern=`^(s3|gs|azure|minio)://[a-zA-Z0-9.\-_]+(/.*)?$`
// +kubebuilder:validation:Pattern=`^(s3|gs|azure|minio|seaweedfs|s3compat)://[a-zA-Z0-9.\-_]+(/.*)?$`
Path string `json:"path"`

// Optional override endpoint (only needed for S3-compatible services like MinIO)
// Must be a valid HTTP/HTTPS URL
// Optional override endpoint (only needed for S3-compatible services like MinIO, SeaweedFS)
// Must be a valid HTTP/HTTPS URL. When set with s3:// path, backend is treated as S3-compatible (MinIO, SeaweedFS, etc.)
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Pattern=`^https?://.*$`
Endpoint string `json:"endpoint,omitempty"`
Expand All @@ -380,11 +380,17 @@ type ObjectStorageSpec struct {
// +kubebuilder:validation:MinLength=1
Region string `json:"region"`

// Secret name containing storage credentials
// Secret name containing storage credentials (e.g. s3_access_key, s3_secret_key for S3-compatible backends)
// +kubebuilder:validation:Optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=253
SecretRef string `json:"secretRef,omitempty"`

// Provider is an optional hint for documentation and tooling. Operator derives behavior from path scheme and endpoint.
// Values: aws, minio, seaweedfs, s3compat, gcs, azure
// +kubebuilder:validation:Optional
// +kubebuilder:validation:Enum=aws;minio;seaweedfs;s3compat;gcs;azure
Provider string `json:"provider,omitempty"`
}

// IngressSpec defines Ingress configuration for external access to platform services
Expand Down
45 changes: 45 additions & 0 deletions api/v1/aiservice_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,12 @@ type AIServiceSpec struct {
// +kubebuilder:validation:Optional
AIPlatformUrl string `json:"aiPlatformUrl,omitempty"`

// AIPlatformScheme specifies the URL scheme for the AI Platform service ("http" or "https")
// +kubebuilder:validation:Optional
// +kubebuilder:default="http"
// +kubebuilder:validation:Enum=http;https
AIPlatformScheme string `json:"aiPlatformScheme,omitempty"`

// AIPlatformRef is a reference to the AIPlatform resource
// +kubebuilder:validation:Required
AIPlatformRef corev1.ObjectReference `json:"aiPlatformRef"`
Expand Down Expand Up @@ -117,6 +123,45 @@ type AIServiceSpec struct {
// +kubebuilder:default="cluster.local"
// +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$`
ClusterDomain string `json:"clusterDomain,omitempty"`

// V2 configures the SAIA v2 deployment. v2 is always deployed alongside v1 behind nginx.
// Users toggle Agent Mode (v1 vs v2) from the Splunk Settings UI.
// +kubebuilder:validation:Optional
V2 SAIAv2Config `json:"v2,omitempty"`

// V2Worker configures the v2 SAIA worker deployment (same v2 image, command=run-worker.sh).
// +kubebuilder:validation:Optional
V2Worker SAIAWorkerConfig `json:"v2Worker,omitempty"`
}

// SAIAv2Config defines the configuration for the SAIA v2 API deployment.
type SAIAv2Config struct {
// Image is the container image for the v2 API pod
// +kubebuilder:validation:Optional
Image string `json:"image,omitempty"`

// Replicas is the number of v2 API replicas
// +kubebuilder:validation:Optional
// +kubebuilder:default=1
// +kubebuilder:validation:Minimum=0
Replicas int32 `json:"replicas,omitempty"`

// Resources defines the compute resources for the v2 API pods
// +kubebuilder:validation:Optional
Resources corev1.ResourceRequirements `json:"resources,omitempty"`
}

// SAIAWorkerConfig defines the configuration for a SAIA worker deployment.
type SAIAWorkerConfig struct {
// Replicas is the number of worker replicas
// +kubebuilder:validation:Optional
// +kubebuilder:default=1
// +kubebuilder:validation:Minimum=0
Replicas int32 `json:"replicas,omitempty"`

// Resources defines the compute resources for the worker pods
// +kubebuilder:validation:Optional
Resources corev1.ResourceRequirements `json:"resources,omitempty"`
}

// MetricsConfig defines the metrics configuration for monitoring
Expand Down
34 changes: 34 additions & 0 deletions api/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading