Skip to content

Feat/syva session 4b#47

Merged
yairfalse merged 9 commits into
mainfrom
feat/syva-session-4b
Apr 25, 2026
Merged

Feat/syva session 4b#47
yairfalse merged 9 commits into
mainfrom
feat/syva-session-4b

Conversation

@yairfalse
Copy link
Copy Markdown
Contributor

New commits:

  • 4cd0725 feat(syva-cp-client): zone CRUD methods (create, update, delete, get, list)
  • ec5ef39 feat(syva-adapter-file): rewrite to push zones to syva-cp via syva-cp-client
  • 84c8191 feat(syva-adapter-k8s): rewrite to push zones to syva-cp via syva-cp-client
  • f06f4fc feat(syva-adapter-api): rewrite as thin REST proxy in front of syva-cp ZoneService
  • 571b44f refactor(syva-core): remove local gRPC surface, make --cp-endpoint mandatory
  • cbb8461 feat(deploy): v0.3 manifests for cp-mode (syva-core DaemonSet, adapter Deployments)
  • 81dbaa3 docs: reflect single-ingestion-path architecture after 4b
  • 0fc40dd chore(workspace): fix legacy warnings for strict workspace verification
    What landed:
  • syva-cp-client now supports zone CRUD for adapters
  • syva-adapter-file, syva-adapter-k8s, and syva-adapter-api now talk to syva-cp directly
  • syva-core no longer exposes the local gRPC surface; --cp-endpoint is mandatory
  • new deploy/v0.3/daemonset-cp-mode.yaml
  • root/core/adapter docs and AGENT.md updated for the single-ingestion-path architecture
    Verified:
  • cargo check --workspace
  • cargo clippy --workspace -- -D warnings
  • DATABASE_URL=postgres://postgres:dev@localhost:5432/syva_cp_test cargo test --workspace
  • DATABASE_URL=postgres://postgres:dev@localhost:5432/syva_cp_test cargo test -p syva-cp
  • cargo run -p xtask -- check-write-discipline
  • CLI shape checks for syva-core, syva-file, syva-k8s, syva-api
    Caveat:

…-client

- Removes dependency on local syva-core gRPC surface
- New CLI: --cp-endpoint, --team-id required
- Reconciles TOML directory against syva-cp every --reconcile-secs
- Container watcher deferred to a later session
- verify subcommand preserved
…client

- CRD remains source of truth (kubectl apply wins over API edits)
- Removes dependency on local syva-core gRPC surface
- New CLI: --cp-endpoint, --team-id required, --namespace optional
- Pod annotation/container watcher deferred to a later session
…p ZoneService

- Removes dependency on local syva-core gRPC surface
- New CLI: --cp-endpoint, --team-id required
- Endpoints: POST/GET/PUT/DELETE /v1/zones[, /{name}]
…ndatory

syva-core now only ingests zones via syva-cp's NodeAssignmentUpdate stream.
The local gRPC server is deleted. Adapters connect to syva-cp directly
(see session 4b).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the “session 4b” architecture shift: adapters now perform zone CRUD directly against syva-cp via syva-cp-client, and syva-core becomes CP-only (no local adapter-facing gRPC surface).

Changes:

  • Add zone CRUD APIs to syva-cp-client and migrate adapters (file/k8s/api) to use them.
  • Remove syva-core’s local gRPC server and rework it to ingest desired state exclusively via the NodeAssignmentUpdate stream.
  • Add/update deployment manifests and documentation for CP-mode-only operation.

Reviewed changes

Copilot reviewed 47 out of 48 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
syva/src/zone.rs Suppress unused API warnings; test tweak
syva/src/types.rs Derive Default for ZonePolicy
syva/src/mapper.rs Dead-code allowances for label helpers
syva/src/main.rs Minor cleanups in status/verify commands
syva/src/health.rs Refactor hook metric tuple typing
syva/src/ebpf.rs Update Aya types (Ebpf/EbpfLoader) and cleanup
syva-cp-client/src/lib.rs Re-export zone CRUD types
syva-cp-client/src/error.rs Box tonic/serde errors; add From impls
syva-cp-client/src/client.rs Implement zone CRUD + snapshot parsing helpers
syva-core/src/zone.rs Mark some APIs dead-code outside tests
syva-core/src/rpc/mod.rs Remove local gRPC service implementation
syva-core/src/main.rs Make --cp-endpoint mandatory; run CP-only reconcile loop
syva-core/src/ingest.rs Extract former gRPC mutation helpers into shared ingest module
syva-core/src/ebpf.rs Dead-code allowances for formerly-RPC-only methods
syva-core/src/cp_reconcile/mod.rs Point reconcile logic at new ingest module; comm sync timing tweak
syva-core/README.md Document CP-only ingestion path
syva-core/Cargo.toml Remove tonic/prost/tokio-stream deps
syva-adapter-k8s/src/watcher.rs Rewrite: reconcile CRDs into syva-cp zones (no pod membership)
syva-adapter-k8s/src/mapper.rs Map CRD spec → cp-client create/update args
syva-adapter-k8s/src/main.rs New CLI: namespace/cp-endpoint/team-id
syva-adapter-k8s/src/crd.rs Add selector spec; derive defaults/eq where needed
syva-adapter-k8s/src/connect.rs Remove syva-core unix-socket connector
syva-adapter-k8s/README.md New usage docs for CP-mode adapter
syva-adapter-k8s/Cargo.toml Swap deps from syva-core gRPC → cp-client
syva-adapter-file/src/watcher.rs Remove container watcher logic
syva-adapter-file/src/verify.rs New verify command implementation
syva-adapter-file/src/types.rs Derive Default for ZonePolicy; minor test tweak
syva-adapter-file/src/translate.rs Map file policies → cp-client create/update args
syva-adapter-file/src/run.rs New polling reconciler against syva-cp
syva-adapter-file/src/reload.rs Remove hot-reload diff engine
syva-adapter-file/src/policy.rs Restructure policy loading into FilePolicy (with selector/display_name)
syva-adapter-file/src/mapper.rs Remove label→zone mapper (no membership ingestion)
syva-adapter-file/src/main.rs New CLI: policy-dir/cp-endpoint/team-id/reconcile interval
syva-adapter-file/src/lib.rs Add library entrypoint for reuse from bin
syva-adapter-file/src/connect.rs Remove syva-core unix-socket connector
syva-adapter-file/README.md New usage docs for CP-mode adapter
syva-adapter-file/Cargo.toml Swap deps from syva-core gRPC/containerd → cp-client
syva-adapter-api/src/routes.rs Rewrite as REST proxy to syva-cp ZoneService
syva-adapter-api/src/main.rs New CLI: listen/cp-endpoint/team-id
syva-adapter-api/src/connect.rs Remove syva-core unix-socket connector
syva-adapter-api/README.md New usage/docs for REST proxy endpoints
syva-adapter-api/Cargo.toml Swap deps from syva-core gRPC → cp-client
deploy/v0.3/daemonset-cp-mode.yaml Add CP-mode manifests (core + adapters)
deploy/v0.2/daemonset-k8s.yaml Mark v0.2 deployment as legacy reference
deploy/v0.2/daemonset-file.yaml Mark v0.2 deployment as legacy reference
README.md Update architecture docs to include syva-cp
Cargo.lock Dependency graph updates for new adapter/core deps
AGENT.md Update operator docs to CP-only ingestion model

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread syva-adapter-file/src/translate.rs Outdated
Comment on lines +35 to +44
let display_name_matches = snapshot.display_name == policy.display_name;

if policy_matches && selector_matches && display_name_matches {
return Ok(None);
}

Ok(Some(UpdateZoneArgs {
zone_id: snapshot.zone_id,
if_version: snapshot.version,
policy_json: Some(desired_policy_json),
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policy_to_update_args treats display_name drift as a reason to issue UpdateZone, but UpdateZoneArgs/UpdateZoneRequest have no way to update a zone’s display_name. This will cause reconcile loops to continuously call update_zone (and create new, identical policy versions) whenever the display name differs. Consider either removing display_name_matches from the diff logic or encoding display name in a field that can actually be updated (e.g., metadata_json) and only sending policy_json when it changed.

Suggested change
let display_name_matches = snapshot.display_name == policy.display_name;
if policy_matches && selector_matches && display_name_matches {
return Ok(None);
}
Ok(Some(UpdateZoneArgs {
zone_id: snapshot.zone_id,
if_version: snapshot.version,
policy_json: Some(desired_policy_json),
if policy_matches && selector_matches {
return Ok(None);
}
Ok(Some(UpdateZoneArgs {
zone_id: snapshot.zone_id,
if_version: snapshot.version,
policy_json: if policy_matches {
None
} else {
Some(desired_policy_json)
},

Copilot uses AI. Check for mistakes.
Ok(Some(UpdateZoneArgs {
zone_id: snapshot.zone_id,
if_version: snapshot.version,
policy_json: Some(desired_policy_json),
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policy_to_update_args always sets policy_json: Some(...) when anything differs, even if only the selector changed. That forces a new policy version on every selector-only edit (and can create noisy history). Consider setting policy_json to None when policy_matches is true, and only populating the specific fields that actually changed.

Suggested change
policy_json: Some(desired_policy_json),
policy_json: if policy_matches {
None
} else {
Some(desired_policy_json)
},

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +142
async fn reconcile_once(cp: &CpClient, config: &Config) -> Result<ReconcileStats> {
let on_disk = load_policies_from_dir(&config.policy_dir)
.with_context(|| format!("load policies from {}", config.policy_dir.display()))?;

let in_cp = cp.list_zones(config.team_id, None, 500).await?;
let in_cp_by_name: HashMap<String, ZoneSnapshot> =
in_cp.into_iter().map(|zone| (zone.name.clone(), zone)).collect();

let mut stats = ReconcileStats::default();

for (name, policy) in &on_disk {
match cp.get_zone_by_name(config.team_id, name).await? {
None => match cp.create_zone(policy_to_create_args(config.team_id, name, policy)?).await
{
Ok(output) => {
stats.created += 1;
stats.changed += 1;
info!(zone = %name, zone_id = %output.zone_id, "zone created");
}
Err(error) => warn!(zone = %name, error = %error, "create_zone failed"),
},
Some(snapshot) => match policy_to_update_args(&snapshot, policy)? {
Some(args) => match cp.update_zone(args).await {
Ok(output) => {
stats.updated += 1;
stats.changed += 1;
info!(
zone = %name,
zone_id = %output.zone_id,
version = output.version,
"zone updated"
);
}
Err(error) => warn!(zone = %name, error = %error, "update_zone failed"),
},
None => debug!(zone = %name, "zone unchanged"),
},
}
}

for (name, snapshot) in &in_cp_by_name {
if on_disk.contains_key(name) || snapshot.status == "deleted" {
continue;
}

match cp
.delete_zone(DeleteZoneArgs {
zone_id: snapshot.zone_id,
if_version: snapshot.version,
drain: true,
})
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconcile_once will request deletion for every zone present in syva-cp but missing on disk. Since load_policies_from_dir returns an empty map when the directory is missing/empty, a transient issue (missing mount, ConfigMap rotation, permission error) can trigger a mass drain-delete of all zones for the team. Consider treating a missing/empty policy directory as an error (skip the prune phase), or gate deletions behind an explicit --prune/--allow-delete flag and/or a "last known good" snapshot.

Copilot uses AI. Check for mistakes.
Comment on lines 34 to +53
for entry in entries {
let entry = match entry {
Ok(e) => e,
Err(e) => {
tracing::warn!(%e, "skipping unreadable directory entry");
continue;
}
};

let entry = entry?;
let path = entry.path();
if path.extension().map(|e| e != "toml").unwrap_or(true) {
if path.extension().and_then(|ext| ext.to_str()) != Some("toml") {
continue;
}

let zone_name = match path.file_stem().and_then(|s| s.to_str()) {
Some(name) => name.to_string(),
None => continue,
};

let content = match std::fs::read_to_string(&path) {
Ok(c) => c,
Err(e) => {
tracing::warn!(file = %path.display(), %e, "skipping unreadable policy file");
continue;
}
let Some(stem) = path.file_stem().and_then(|stem| stem.to_str()) else {
continue;
};

match toml::from_str::<ZonePolicy>(&content) {
Ok(policy) => match policy.validate(&zone_name) {
Ok(()) => {
tracing::info!(zone = zone_name, file = %path.display(), "loaded zone policy");
policies.insert(zone_name, policy);
}
Err(e) => {
tracing::warn!(file = %path.display(), %e, "invalid policy — zone will not be enforced");
}
}
Err(e) => {
tracing::warn!(
file = %path.display(), %e,
"failed to parse policy file — zone will not be enforced"
);
}
if stem.is_empty() {
continue;
}

let text = std::fs::read_to_string(&path)
.with_context(|| format!("read {}", path.display()))?;
let policy: FilePolicy = toml::from_str(&text)
.with_context(|| format!("parse {}", path.display()))?;
policy.policy.validate(stem)?;
policies.insert(stem.to_string(), policy);
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors while iterating/reading/parsing a single policy file currently abort the entire directory load (entry?, read_to_string?, toml::from_str?). That means one bad or temporarily unreadable file prevents reconciliation of all other zones (and can interact badly with the prune phase). Consider handling per-file failures by logging a warning and skipping that file, while still continuing to load the rest.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +104
let crd_list = crds.list(&Default::default()).await?;
let in_cp = cp.list_zones(team_id, None, 500).await?;
let in_cp_by_name: HashMap<String, ZoneSnapshot> =
in_cp.into_iter().map(|zone| (zone.name.clone(), zone)).collect();

let mut crd_names = HashSet::new();
for crd in &crd_list {
let Some(name) = crd.metadata.name.clone() else {
continue;
};
crd_names.insert(name.clone());

match cp.get_zone_by_name(team_id, &name).await? {
None => {
let args = spec_to_create_args(team_id, &name, crd)?;
match cp.create_zone(args).await {
Ok(_) => info!(zone = %name, "zone created from CRD (initial)"),
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial_reconcile does a list_zones and then still calls get_zone_by_name for every CRD (N+1 RPCs). You can use in_cp_by_name.contains_key(&name) to decide whether to create vs fetch the full snapshot, and only call get_zone_by_name for zones that already exist (where you need current_policy_json to diff). This reduces load on syva-cp for clusters with many CRDs.

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 29
if !dir.exists() {
tracing::warn!(dir = %dir.display(), "policy directory does not exist — no zones will be enforced");
tracing::warn!(
dir = %dir.display(),
"policy directory does not exist; no zones will be reconciled"
);
return Ok(policies);
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_policies_from_dir returns Ok(empty) when the policy directory doesn’t exist. In the new polling reconciler, an empty on-disk set is interpreted as “delete everything in syva-cp”, so a missing mount/path can accidentally drain-delete all zones. Consider returning an error here (so reconcile fails fast) or returning a distinct outcome that the caller can use to skip deletions.

Copilot uses AI. Check for mistakes.
Ok(Some(UpdateZoneArgs {
zone_id: snapshot.zone_id,
if_version: snapshot.version,
policy_json: Some(desired_policy_json),
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec_to_update_args always sets policy_json: Some(...) whenever anything differs, even when only selector_json changed. Since UpdateZoneRequest treats policy_json as “create a new policy version”, this will generate unnecessary policy versions for selector-only edits. Consider setting policy_json to None when policy_matches is true and only updating the changed fields.

Suggested change
policy_json: Some(desired_policy_json),
policy_json: if policy_matches {
None
} else {
Some(desired_policy_json)
},

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +151
pub async fn create_zone(
State(state): State<AppState>,
Json(body): Json<CreateZoneBody>,
) -> Result<(StatusCode, Json<CreateZoneOut>), ApiError> {
let output = state
.cp
.create_zone(CreateZoneArgs {
team_id: state.team_id,
name: body.name,
display_name: body.display_name,
policy_json: body.policy_json,
summary_json: None,
selector_json: body.selector_json,
metadata_json: None,
})
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_zone forwards requests to syva-cp without validating basic invariants like a non-empty name. As written, an empty name will likely surface as a gRPC error and get reported as 502 (via from_cp), which is misleading for clients. Consider validating body.name (and optionally policy_json) and returning HTTP 400 for invalid input before calling syva-cp.

Copilot uses AI. Check for mistakes.
Comment thread syva/src/mapper.rs
Comment on lines 7 to 15
pub const LABEL_ZONE: &str = "syva.dev/zone";
#[allow(dead_code)]
pub const LABEL_POLICY: &str = "syva.dev/policy";

/// Determine the zone name from container labels.
///
/// Returns None if the container has no `syva.dev/zone` label (global/unzoned).
#[allow(dead_code)]
pub fn zone_from_labels(labels: &std::collections::HashMap<String, String>) -> Option<String> {
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These #[allow(dead_code)] annotations are unconditional, which can hide genuine unused-code regressions in normal builds. Elsewhere in the repo (e.g., syva-core/src/zone.rs) similar suppression uses #[cfg_attr(not(test), allow(dead_code))] to keep test builds strict. Consider switching to cfg_attr here (or removing the allows if these items should stay part of the public API).

Copilot uses AI. Check for mistakes.
Comment thread syva/src/zone.rs
Comment on lines +235 to 236
#[allow(dead_code)]
pub fn all_zones(&self) -> impl Iterator<Item = (&str, u32)> {
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[allow(dead_code)] on a public API method can mask real unused-code issues in regular builds. If the goal is only to silence warnings for non-test builds, consider using #[cfg_attr(not(test), allow(dead_code))] (as used in syva-core/src/zone.rs) or removing the allow if this method is expected to remain part of the public API surface.

Copilot uses AI. Check for mistakes.
@yairfalse
Copy link
Copy Markdown
Contributor Author

Addressed the review feedback in fdc6ea2.

What changed:

  • fixed a syva-file bug where display_name drift caused perpetual updates even though ZoneService::UpdateZone cannot mutate display_name
  • added one-refresh retry handling for adapter-side zone update/delete conflicts after stale-version races
  • mapped syva-api gRPC failures to more accurate HTTP statuses instead of returning 502 for everything
  • restored syva-core CP-only startup on this branch and made startup resilient by retrying CP connect/register instead of exiting on transient failures
  • restored the reconciler distinction between draining and fully removed zones so draining assignments stay in applied state until cleanup completes

Re-verified:

  • cargo clippy on syva-cp-client, syva-adapter-file, syva-adapter-k8s, syva-adapter-api, syva-core with -D warnings
  • cargo test on syva-adapter-api, syva-adapter-file, syva-adapter-k8s, syva-core
  • cargo run -p xtask -- check-write-discipline
  • syva-cp database-backed test suite against syva_cp_test

@yairfalse yairfalse merged commit 5c5ffe5 into main Apr 25, 2026
6 checks passed
@yairfalse yairfalse deleted the feat/syva-session-4b branch April 25, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants