Feat/syva session 4b#47
Conversation
…-client - Removes dependency on local syva-core gRPC surface - New CLI: --cp-endpoint, --team-id required - Reconciles TOML directory against syva-cp every --reconcile-secs - Container watcher deferred to a later session - verify subcommand preserved
…client - CRD remains source of truth (kubectl apply wins over API edits) - Removes dependency on local syva-core gRPC surface - New CLI: --cp-endpoint, --team-id required, --namespace optional - Pod annotation/container watcher deferred to a later session
…p ZoneService
- Removes dependency on local syva-core gRPC surface
- New CLI: --cp-endpoint, --team-id required
- Endpoints: POST/GET/PUT/DELETE /v1/zones[, /{name}]
…ndatory syva-core now only ingests zones via syva-cp's NodeAssignmentUpdate stream. The local gRPC server is deleted. Adapters connect to syva-cp directly (see session 4b).
There was a problem hiding this comment.
Pull request overview
This PR completes the “session 4b” architecture shift: adapters now perform zone CRUD directly against syva-cp via syva-cp-client, and syva-core becomes CP-only (no local adapter-facing gRPC surface).
Changes:
- Add zone CRUD APIs to
syva-cp-clientand migrate adapters (file/k8s/api) to use them. - Remove
syva-core’s local gRPC server and rework it to ingest desired state exclusively via theNodeAssignmentUpdatestream. - Add/update deployment manifests and documentation for CP-mode-only operation.
Reviewed changes
Copilot reviewed 47 out of 48 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| syva/src/zone.rs | Suppress unused API warnings; test tweak |
| syva/src/types.rs | Derive Default for ZonePolicy |
| syva/src/mapper.rs | Dead-code allowances for label helpers |
| syva/src/main.rs | Minor cleanups in status/verify commands |
| syva/src/health.rs | Refactor hook metric tuple typing |
| syva/src/ebpf.rs | Update Aya types (Ebpf/EbpfLoader) and cleanup |
| syva-cp-client/src/lib.rs | Re-export zone CRUD types |
| syva-cp-client/src/error.rs | Box tonic/serde errors; add From impls |
| syva-cp-client/src/client.rs | Implement zone CRUD + snapshot parsing helpers |
| syva-core/src/zone.rs | Mark some APIs dead-code outside tests |
| syva-core/src/rpc/mod.rs | Remove local gRPC service implementation |
| syva-core/src/main.rs | Make --cp-endpoint mandatory; run CP-only reconcile loop |
| syva-core/src/ingest.rs | Extract former gRPC mutation helpers into shared ingest module |
| syva-core/src/ebpf.rs | Dead-code allowances for formerly-RPC-only methods |
| syva-core/src/cp_reconcile/mod.rs | Point reconcile logic at new ingest module; comm sync timing tweak |
| syva-core/README.md | Document CP-only ingestion path |
| syva-core/Cargo.toml | Remove tonic/prost/tokio-stream deps |
| syva-adapter-k8s/src/watcher.rs | Rewrite: reconcile CRDs into syva-cp zones (no pod membership) |
| syva-adapter-k8s/src/mapper.rs | Map CRD spec → cp-client create/update args |
| syva-adapter-k8s/src/main.rs | New CLI: namespace/cp-endpoint/team-id |
| syva-adapter-k8s/src/crd.rs | Add selector spec; derive defaults/eq where needed |
| syva-adapter-k8s/src/connect.rs | Remove syva-core unix-socket connector |
| syva-adapter-k8s/README.md | New usage docs for CP-mode adapter |
| syva-adapter-k8s/Cargo.toml | Swap deps from syva-core gRPC → cp-client |
| syva-adapter-file/src/watcher.rs | Remove container watcher logic |
| syva-adapter-file/src/verify.rs | New verify command implementation |
| syva-adapter-file/src/types.rs | Derive Default for ZonePolicy; minor test tweak |
| syva-adapter-file/src/translate.rs | Map file policies → cp-client create/update args |
| syva-adapter-file/src/run.rs | New polling reconciler against syva-cp |
| syva-adapter-file/src/reload.rs | Remove hot-reload diff engine |
| syva-adapter-file/src/policy.rs | Restructure policy loading into FilePolicy (with selector/display_name) |
| syva-adapter-file/src/mapper.rs | Remove label→zone mapper (no membership ingestion) |
| syva-adapter-file/src/main.rs | New CLI: policy-dir/cp-endpoint/team-id/reconcile interval |
| syva-adapter-file/src/lib.rs | Add library entrypoint for reuse from bin |
| syva-adapter-file/src/connect.rs | Remove syva-core unix-socket connector |
| syva-adapter-file/README.md | New usage docs for CP-mode adapter |
| syva-adapter-file/Cargo.toml | Swap deps from syva-core gRPC/containerd → cp-client |
| syva-adapter-api/src/routes.rs | Rewrite as REST proxy to syva-cp ZoneService |
| syva-adapter-api/src/main.rs | New CLI: listen/cp-endpoint/team-id |
| syva-adapter-api/src/connect.rs | Remove syva-core unix-socket connector |
| syva-adapter-api/README.md | New usage/docs for REST proxy endpoints |
| syva-adapter-api/Cargo.toml | Swap deps from syva-core gRPC → cp-client |
| deploy/v0.3/daemonset-cp-mode.yaml | Add CP-mode manifests (core + adapters) |
| deploy/v0.2/daemonset-k8s.yaml | Mark v0.2 deployment as legacy reference |
| deploy/v0.2/daemonset-file.yaml | Mark v0.2 deployment as legacy reference |
| README.md | Update architecture docs to include syva-cp |
| Cargo.lock | Dependency graph updates for new adapter/core deps |
| AGENT.md | Update operator docs to CP-only ingestion model |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let display_name_matches = snapshot.display_name == policy.display_name; | ||
|
|
||
| if policy_matches && selector_matches && display_name_matches { | ||
| return Ok(None); | ||
| } | ||
|
|
||
| Ok(Some(UpdateZoneArgs { | ||
| zone_id: snapshot.zone_id, | ||
| if_version: snapshot.version, | ||
| policy_json: Some(desired_policy_json), |
There was a problem hiding this comment.
policy_to_update_args treats display_name drift as a reason to issue UpdateZone, but UpdateZoneArgs/UpdateZoneRequest have no way to update a zone’s display_name. This will cause reconcile loops to continuously call update_zone (and create new, identical policy versions) whenever the display name differs. Consider either removing display_name_matches from the diff logic or encoding display name in a field that can actually be updated (e.g., metadata_json) and only sending policy_json when it changed.
| let display_name_matches = snapshot.display_name == policy.display_name; | |
| if policy_matches && selector_matches && display_name_matches { | |
| return Ok(None); | |
| } | |
| Ok(Some(UpdateZoneArgs { | |
| zone_id: snapshot.zone_id, | |
| if_version: snapshot.version, | |
| policy_json: Some(desired_policy_json), | |
| if policy_matches && selector_matches { | |
| return Ok(None); | |
| } | |
| Ok(Some(UpdateZoneArgs { | |
| zone_id: snapshot.zone_id, | |
| if_version: snapshot.version, | |
| policy_json: if policy_matches { | |
| None | |
| } else { | |
| Some(desired_policy_json) | |
| }, |
| Ok(Some(UpdateZoneArgs { | ||
| zone_id: snapshot.zone_id, | ||
| if_version: snapshot.version, | ||
| policy_json: Some(desired_policy_json), |
There was a problem hiding this comment.
policy_to_update_args always sets policy_json: Some(...) when anything differs, even if only the selector changed. That forces a new policy version on every selector-only edit (and can create noisy history). Consider setting policy_json to None when policy_matches is true, and only populating the specific fields that actually changed.
| policy_json: Some(desired_policy_json), | |
| policy_json: if policy_matches { | |
| None | |
| } else { | |
| Some(desired_policy_json) | |
| }, |
| async fn reconcile_once(cp: &CpClient, config: &Config) -> Result<ReconcileStats> { | ||
| let on_disk = load_policies_from_dir(&config.policy_dir) | ||
| .with_context(|| format!("load policies from {}", config.policy_dir.display()))?; | ||
|
|
||
| let in_cp = cp.list_zones(config.team_id, None, 500).await?; | ||
| let in_cp_by_name: HashMap<String, ZoneSnapshot> = | ||
| in_cp.into_iter().map(|zone| (zone.name.clone(), zone)).collect(); | ||
|
|
||
| let mut stats = ReconcileStats::default(); | ||
|
|
||
| for (name, policy) in &on_disk { | ||
| match cp.get_zone_by_name(config.team_id, name).await? { | ||
| None => match cp.create_zone(policy_to_create_args(config.team_id, name, policy)?).await | ||
| { | ||
| Ok(output) => { | ||
| stats.created += 1; | ||
| stats.changed += 1; | ||
| info!(zone = %name, zone_id = %output.zone_id, "zone created"); | ||
| } | ||
| Err(error) => warn!(zone = %name, error = %error, "create_zone failed"), | ||
| }, | ||
| Some(snapshot) => match policy_to_update_args(&snapshot, policy)? { | ||
| Some(args) => match cp.update_zone(args).await { | ||
| Ok(output) => { | ||
| stats.updated += 1; | ||
| stats.changed += 1; | ||
| info!( | ||
| zone = %name, | ||
| zone_id = %output.zone_id, | ||
| version = output.version, | ||
| "zone updated" | ||
| ); | ||
| } | ||
| Err(error) => warn!(zone = %name, error = %error, "update_zone failed"), | ||
| }, | ||
| None => debug!(zone = %name, "zone unchanged"), | ||
| }, | ||
| } | ||
| } | ||
|
|
||
| for (name, snapshot) in &in_cp_by_name { | ||
| if on_disk.contains_key(name) || snapshot.status == "deleted" { | ||
| continue; | ||
| } | ||
|
|
||
| match cp | ||
| .delete_zone(DeleteZoneArgs { | ||
| zone_id: snapshot.zone_id, | ||
| if_version: snapshot.version, | ||
| drain: true, | ||
| }) |
There was a problem hiding this comment.
reconcile_once will request deletion for every zone present in syva-cp but missing on disk. Since load_policies_from_dir returns an empty map when the directory is missing/empty, a transient issue (missing mount, ConfigMap rotation, permission error) can trigger a mass drain-delete of all zones for the team. Consider treating a missing/empty policy directory as an error (skip the prune phase), or gate deletions behind an explicit --prune/--allow-delete flag and/or a "last known good" snapshot.
| for entry in entries { | ||
| let entry = match entry { | ||
| Ok(e) => e, | ||
| Err(e) => { | ||
| tracing::warn!(%e, "skipping unreadable directory entry"); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| let entry = entry?; | ||
| let path = entry.path(); | ||
| if path.extension().map(|e| e != "toml").unwrap_or(true) { | ||
| if path.extension().and_then(|ext| ext.to_str()) != Some("toml") { | ||
| continue; | ||
| } | ||
|
|
||
| let zone_name = match path.file_stem().and_then(|s| s.to_str()) { | ||
| Some(name) => name.to_string(), | ||
| None => continue, | ||
| }; | ||
|
|
||
| let content = match std::fs::read_to_string(&path) { | ||
| Ok(c) => c, | ||
| Err(e) => { | ||
| tracing::warn!(file = %path.display(), %e, "skipping unreadable policy file"); | ||
| continue; | ||
| } | ||
| let Some(stem) = path.file_stem().and_then(|stem| stem.to_str()) else { | ||
| continue; | ||
| }; | ||
|
|
||
| match toml::from_str::<ZonePolicy>(&content) { | ||
| Ok(policy) => match policy.validate(&zone_name) { | ||
| Ok(()) => { | ||
| tracing::info!(zone = zone_name, file = %path.display(), "loaded zone policy"); | ||
| policies.insert(zone_name, policy); | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!(file = %path.display(), %e, "invalid policy — zone will not be enforced"); | ||
| } | ||
| } | ||
| Err(e) => { | ||
| tracing::warn!( | ||
| file = %path.display(), %e, | ||
| "failed to parse policy file — zone will not be enforced" | ||
| ); | ||
| } | ||
| if stem.is_empty() { | ||
| continue; | ||
| } | ||
|
|
||
| let text = std::fs::read_to_string(&path) | ||
| .with_context(|| format!("read {}", path.display()))?; | ||
| let policy: FilePolicy = toml::from_str(&text) | ||
| .with_context(|| format!("parse {}", path.display()))?; | ||
| policy.policy.validate(stem)?; | ||
| policies.insert(stem.to_string(), policy); |
There was a problem hiding this comment.
Errors while iterating/reading/parsing a single policy file currently abort the entire directory load (entry?, read_to_string?, toml::from_str?). That means one bad or temporarily unreadable file prevents reconciliation of all other zones (and can interact badly with the prune phase). Consider handling per-file failures by logging a warning and skipping that file, while still continuing to load the rest.
| let crd_list = crds.list(&Default::default()).await?; | ||
| let in_cp = cp.list_zones(team_id, None, 500).await?; | ||
| let in_cp_by_name: HashMap<String, ZoneSnapshot> = | ||
| in_cp.into_iter().map(|zone| (zone.name.clone(), zone)).collect(); | ||
|
|
||
| let mut crd_names = HashSet::new(); | ||
| for crd in &crd_list { | ||
| let Some(name) = crd.metadata.name.clone() else { | ||
| continue; | ||
| }; | ||
| crd_names.insert(name.clone()); | ||
|
|
||
| match cp.get_zone_by_name(team_id, &name).await? { | ||
| None => { | ||
| let args = spec_to_create_args(team_id, &name, crd)?; | ||
| match cp.create_zone(args).await { | ||
| Ok(_) => info!(zone = %name, "zone created from CRD (initial)"), |
There was a problem hiding this comment.
initial_reconcile does a list_zones and then still calls get_zone_by_name for every CRD (N+1 RPCs). You can use in_cp_by_name.contains_key(&name) to decide whether to create vs fetch the full snapshot, and only call get_zone_by_name for zones that already exist (where you need current_policy_json to diff). This reduces load on syva-cp for clusters with many CRDs.
| if !dir.exists() { | ||
| tracing::warn!(dir = %dir.display(), "policy directory does not exist — no zones will be enforced"); | ||
| tracing::warn!( | ||
| dir = %dir.display(), | ||
| "policy directory does not exist; no zones will be reconciled" | ||
| ); | ||
| return Ok(policies); | ||
| } |
There was a problem hiding this comment.
load_policies_from_dir returns Ok(empty) when the policy directory doesn’t exist. In the new polling reconciler, an empty on-disk set is interpreted as “delete everything in syva-cp”, so a missing mount/path can accidentally drain-delete all zones. Consider returning an error here (so reconcile fails fast) or returning a distinct outcome that the caller can use to skip deletions.
| Ok(Some(UpdateZoneArgs { | ||
| zone_id: snapshot.zone_id, | ||
| if_version: snapshot.version, | ||
| policy_json: Some(desired_policy_json), |
There was a problem hiding this comment.
spec_to_update_args always sets policy_json: Some(...) whenever anything differs, even when only selector_json changed. Since UpdateZoneRequest treats policy_json as “create a new policy version”, this will generate unnecessary policy versions for selector-only edits. Consider setting policy_json to None when policy_matches is true and only updating the changed fields.
| policy_json: Some(desired_policy_json), | |
| policy_json: if policy_matches { | |
| None | |
| } else { | |
| Some(desired_policy_json) | |
| }, |
| pub async fn create_zone( | ||
| State(state): State<AppState>, | ||
| Json(body): Json<CreateZoneBody>, | ||
| ) -> Result<(StatusCode, Json<CreateZoneOut>), ApiError> { | ||
| let output = state | ||
| .cp | ||
| .create_zone(CreateZoneArgs { | ||
| team_id: state.team_id, | ||
| name: body.name, | ||
| display_name: body.display_name, | ||
| policy_json: body.policy_json, | ||
| summary_json: None, | ||
| selector_json: body.selector_json, | ||
| metadata_json: None, | ||
| }) |
There was a problem hiding this comment.
create_zone forwards requests to syva-cp without validating basic invariants like a non-empty name. As written, an empty name will likely surface as a gRPC error and get reported as 502 (via from_cp), which is misleading for clients. Consider validating body.name (and optionally policy_json) and returning HTTP 400 for invalid input before calling syva-cp.
| pub const LABEL_ZONE: &str = "syva.dev/zone"; | ||
| #[allow(dead_code)] | ||
| pub const LABEL_POLICY: &str = "syva.dev/policy"; | ||
|
|
||
| /// Determine the zone name from container labels. | ||
| /// | ||
| /// Returns None if the container has no `syva.dev/zone` label (global/unzoned). | ||
| #[allow(dead_code)] | ||
| pub fn zone_from_labels(labels: &std::collections::HashMap<String, String>) -> Option<String> { |
There was a problem hiding this comment.
These #[allow(dead_code)] annotations are unconditional, which can hide genuine unused-code regressions in normal builds. Elsewhere in the repo (e.g., syva-core/src/zone.rs) similar suppression uses #[cfg_attr(not(test), allow(dead_code))] to keep test builds strict. Consider switching to cfg_attr here (or removing the allows if these items should stay part of the public API).
| #[allow(dead_code)] | ||
| pub fn all_zones(&self) -> impl Iterator<Item = (&str, u32)> { |
There was a problem hiding this comment.
#[allow(dead_code)] on a public API method can mask real unused-code issues in regular builds. If the goal is only to silence warnings for non-test builds, consider using #[cfg_attr(not(test), allow(dead_code))] (as used in syva-core/src/zone.rs) or removing the allow if this method is expected to remain part of the public API surface.
|
Addressed the review feedback in fdc6ea2. What changed:
Re-verified:
|
New commits:
What landed:
Verified:
Caveat: