Split base/ into individual deployable modules#426
Open
yangw-dev wants to merge 12 commits into
Open
Conversation
Every component previously bundled in the eks module is now its own deployable module, discovered and deployed through the unified deploy-module flow. eks itself is just the first module in the list. New modules: storageclass-eks, nvidia-device-plugin, node-performance-tuning, registry-mirror, harbor, git-cache, node-compactor — each with kubernetes manifests, deploy scripts (where needed), and smoke tests. Key changes: - Rename modules/eks/terraform/ → infra/ (skip deploy-module Phase 1) - Simplify modules/eks/deploy.sh to terraform + kubeconfig + mirror - Extract harbor deploy.sh (~240 lines) with EKS terraform output reads - Kubeconfig soft-fail in deploy-module for first-deploy bootstrap - Remove deploy-eks recipe; deploy loops all modules uniformly - Split test_base_kubernetes.py across per-module smoke tests - Update clusters.yaml with full module ordering
yangw-dev
commented
Apr 7, 2026
| fi | ||
| fi | ||
|
|
||
| # Deploy base infrastructure (tofu + harbor + base k8s) |
Contributor
Author
There was a problem hiding this comment.
this moved to eks module
…module # Conflicts: # osdc/base/kubernetes/kustomization.yaml # osdc/justfile # osdc/pyproject.toml
Harbor is tightly coupled to EKS terraform (OIDC, S3 bucket, IAM) — keeping it as a separate module added indirection without benefit. - Move harbor deploy logic back into modules/eks/deploy.sh (Phase 4) - Move helm values, namespace, smoke tests back under modules/eks/ - Remove modules/harbor/ directory - Remove harbor from clusters.yaml module lists - Fix _deploy-harbor justfile recipe paths (terraform → infra)
mirror-images.sh and karpenter/deploy.sh still referenced the old modules/eks/terraform/ path after the rename to infra/. Updated all remaining references in scripts and documentation.
Contributor
|
@claude review this and focus what actually constitutes a functional changes and what just moves code around |
Contributor
|
before proceeding with manual review, claude flagged some big stuff: |
jeanschmidt
requested changes
Apr 8, 2026
Contributor
jeanschmidt
left a comment
There was a problem hiding this comment.
there are issues from claude
- Delete legacy Harbor recipes (_deploy-harbor, _configure-harbor-projects) from justfile - Add harbor and image-cache-janitor to clusters.yaml module lists - Kubeconfig soft-fail only for eks module (not all modules) - Fix port-forward process leak in eks deploy.sh (track both PIDs) - Fix drift test path after image-cache-janitor move to kubernetes/ subdir - Update docs/operations.md and docs/modules.md references
- pyproject.toml: base/kubernetes/image-cache-janitor → modules/image-cache-janitor - justfile test-janitor: base/kubernetes/image-cache-janitor → modules/image-cache-janitor - image-cache-janitor deploy.sh: fix kustomize path to kubernetes/ subdir - image-cache-janitor deploy.sh: update comment (deploy-module, not deploy-base)
- Remove harbor from clusters.yaml modules (deployed by eks deploy.sh, no standalone module dir) - Add storageclass-gp3.yaml to eks kubernetes/ so gp3 exists before Harbor PVCs - Remove kustomization.yaml from image-cache-janitor and node-compactor to prevent Phase 2 applying unsubstituted placeholder manifests before deploy.sh
Replace kubectl kustomize with cat for image-cache-janitor and node-compactor since kustomization.yaml was removed to prevent Phase 2/3 placeholder conflicts.
image-cache-janitor and node-compactor use placeholder substitution in deploy.sh. Rename kubernetes/ to manifests/ so deploy-module Phase 2 doesn't auto-apply unsubstituted manifests. Restore kustomization.yaml for kubectl kustomize in deploy.sh.
storageclass-gp3.yaml already in eks/kubernetes/ (applied in Phase 2 before Harbor needs it). Separate module was redundant.
yangw-dev
commented
Apr 8, 2026
| # --- Apply Kubernetes manifests with image substitution --- | ||
| echo "Applying image-cache-janitor manifests..." | ||
| kubectl kustomize "$JANITOR_DIR/" \ | ||
| kubectl kustomize "$JANITOR_DIR/manifests/" \ |
Contributor
Author
There was a problem hiding this comment.
for janitor, we do not want it direclty apply kustomize
change kubernetes/ folder as manifest so it does not trigger phase2 for this module
we let the deploy.sh handle the changes since it has some placeholder
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every component previously bundled in
base/is now its own deployable module, deployed through the unifieddeploy-moduleflow.eksis just the first module in the list — no specialproviderfield, no separatedeploy-eksordeploy-baserecipe.Harbor stays inside the eks module (tightly coupled to EKS terraform for OIDC/S3/IAM). gp3 StorageClass is also in eks kubernetes/ (Harbor PVCs need it before Harbor installs).
Before:
After:
Key changes
modules/eks/terraform/→modules/eks/infra/— prevents deploy-module Phase 1 from auto-running terraform with wrong state key. eksdeploy.shhandles terraform internally.modules/eks/deploy.sh— 4 phases: terraform, kubeconfig, mirror-images, harbor (helm + secrets + proxy cache)modules/eks/kubernetes/— gp3 StorageClass + harbor-system namespace (applied in Phase 2, before deploy.sh installs Harbor)manifests/convention — modules with placeholder substitution (image-cache-janitor, node-compactor) usemanifests/instead ofkubernetes/to prevent deploy-module Phase 2 from applying unsubstituted manifestsdeploy-eksanddeploy-baserecipes removed — everything goes throughdeploy-moduletest_base_kubernetes.pysplit across per-moduletests/smoke/dirsclusters.yaml— eks as first module, new modules in dependency orderFollow-up work
modules/harbor/with its own deploy.sh that reads terraform outputs from eks infra/ (same pattern as karpenter). Blocked on: no urgency, eks module works.ciforge migration required
After merging this and updating the submodule, ciforge needs:
1.
clusters.yaml— add base modules to EKS clustersThe GCP cluster (
arc-gcp-experiment) already hasgkefirst — no change needed.2.
clusters.yaml— update header commentsReplace the two-phase deploy description:
3.
justfile—deploy-gcpis now redundant (optional cleanup)The upstream
deployrecipe loops all modules uniformly.gkeas the first module handles GCP infra the same wayekshandles AWS. Thekubeconfigoverride for GCP is still needed.4.
pyproject.toml— update testpaths (if overridden)base/node-compactor/scripts/python→modules/node-compactor/scripts/pythonbase/kubernetes/git-cache/scripts/python→modules/git-cache/scripts/pythonbase/kubernetes/image-cache-janitor/scripts/python→modules/image-cache-janitor/scripts/pythonTest plan
just lint— 13/13 linters passjust test— all tests pass, 99.16% coveragejust deploy arc-stagingworks end-to-end with new module orderingjust deploy arc-cbr-production