Skip to content

LPS Network Endpoint Enhancements and Signaling#5904

Open
milan-zededa wants to merge 6 commits intolf-edge:masterfrom
milan-zededa:lps-signaling
Open

LPS Network Endpoint Enhancements and Signaling#5904
milan-zededa wants to merge 6 commits intolf-edge:masterfrom
milan-zededa:lps-signaling

Conversation

@milan-zededa
Copy link
Copy Markdown
Contributor

Description

Implements the EVE side of the API additions introduced in
lf-edge/eve-api#144.

Three API enhancements plus two ancillary improvements:

1. Signal endpoint — low-latency LPS config notifications (GET /api/v1/signal)

EVE now maintains a persistent long-lived HTTP GET connection to the optional
/api/v1/signal endpoint on the Local Profile Server. LPS can push a
Signal proto message (NDJSON-framed) listing which endpoints have a pending
configuration change. On receipt, EVE immediately polls the listed endpoints
instead of waiting for the next scheduled tick.

Implementation details:

  • New goroutine in pkg/pillar/localcommand/signal.go handles the stream
    lifecycle independently of the watchdog.
  • TCP keepalive at 60 s is the sole liveness mechanism (no app-layer heartbeat),
    as required by the spec.
  • Exponential backoff on failure (1 s initial, 2× doubling, 30 s cap).
  • If LPS returns 404 (endpoint not implemented), reconnect attempts are throttled
    to one per hour so a non-signaling LPS does not generate excess traffic.
  • Rate limiter (one signal per 3 s, burst 3) guards against a buggy or malicious
    LPS overwhelming EVE with spurious triggers. Periodic polling is the
    correctness guarantee; dropped signals are safe.
  • The stream is restarted whenever the LPS address changes (UpdateLpsConfig).
  • Unknown ConfigEndpoint enum values are silently ignored for forward
    compatibility.

2. Per-port runtime status (NetworkInfo.port_status)

EVE now populates the new NetworkPortStatus repeated field in every
POST /api/v1/network request. For each network port it reports the
kernel-observed runtime state: link up/down, MAC address, assigned IP
addresses (CIDR), active default gateways, DNS servers, DNS search domain,
NTP servers, and MTU.

3. local_modifications_allowed per port in NetworkInfo.latest_config

EVE sets NetworkPortConfig.local_modifications_allowed when reporting
latest_config in NetworkInfo, mirroring the controller-provisioned
SystemAdapter.allow_local_modifications flag. LPS can use this to know
upfront which ports accept locally submitted configuration, rather than
discovering it via a trial-and-error error message.

4. Reactive LPS network POST on config/status change

EVE now triggers an immediate POST /api/v1/network when either:

  • DevicePortConfigList changes (new network configuration applied), or
  • DeviceNetworkStatus changes with a meaningful state update (link state,
    IP assignment, etc.).

Previously the network endpoint was only driven by the periodic ticker.

5. Configurable LPS polling intervals

All six LPS polling intervals are now tunable via controller config
properties (previously they were compile-time constants):

Property Default Min Max
timer.lps.profile.interval 60 s 3 s 1 h
timer.lps.radio.interval 5 s 3 s 1 h
timer.lps.appinfo.interval 60 s 3 s 1 h
timer.lps.devinfo.interval 60 s 3 s 1 h
timer.lps.network.interval 60 s 3 s 1 h
timer.lps.appbootinfo.interval 60 s 3 s 1 h

Defaults match the previous hard-coded values, so behaviour is unchanged
without explicit configuration.

How to test and validate this PR

Prerequisites

You need a running EVE device with a deployed Local Profile Server application.
Consider using my LPS implementation.

1. Validate local_modifications_allowed in NetworkInfo

LPS side: In the handler for POST /api/v1/network, inspect each entry in
network_info.latest_config.ports. Ports provisioned by the controller with
allow_local_modifications = true must arrive with
local_modifications_allowed = true; those provisioned without it must arrive
with false.

Expected behaviour: LPS can display a read-only indicator next to ports
that it is not permitted to reconfigure, and can pre-validate a
LocalNetworkConfig submission before sending it.

2. Validate per-port runtime status (port_status)

LPS side: In the same POST /api/v1/network handler, inspect
network_info.port_status. For each network port you should see:

  • logical_label — matches the port label from latest_config
  • interface_name — kernel interface name (e.g., eth0)
  • link_uptrue when the port has carrier
  • mac_address — colon-separated hex (e.g., aa:bb:cc:dd:ee:ff)
  • ip_addresses — CIDR notation (e.g., 192.168.1.10/24, fd00::1/64)
  • gateways — active default routers
  • dns_servers / dns_domain — effective resolver configuration
  • ntp_servers — NTP peers in use
  • mtu — effective MTU in bytes

To verify reactivity (enhancement 4): Change a network configuration on
the device (e.g., add a static route, trigger a DHCP renewal). LPS should
receive an updated POST /api/v1/network within a few seconds, not after the
next 60-second tick.

3. Validate the Signal endpoint — basic trigger

LPS side: Implement GET /api/v1/signal as a streaming endpoint that:

  1. Keeps the HTTP connection open indefinitely (use chunked transfer encoding).
  2. When a user submits a configuration change (e.g., via a UI that calls
    PUT /api/v1/local_profile), immediately writes a single NDJSON line:
    {"pendingChanges":["CONFIG_ENDPOINT_LOCAL_PROFILE"]}\n
    
    and flushes the response buffer.
  3. Does NOT write anything until there is actually a pending change.
  4. Does NOT write heartbeats or empty lines.

EVE side: After receiving the signal, EVE must immediately fetch
GET /api/v1/local_profile and apply the new profile — without waiting for
the next 60-second tick.

Verification: Measure the latency between submitting the config change on
LPS and EVE applying it. With signaling it should be under 2 seconds; without
signaling it would be up to 60 seconds.

4. Validate the Signal endpoint — multi-endpoint coalescing

Send a signal listing multiple endpoints simultaneously:

{"pendingChanges":["CONFIG_ENDPOINT_NETWORK","CONFIG_ENDPOINT_APP_INFO"]}\n

EVE must trigger both the network POST and the app info POST immediately.

5. Validate configurable polling intervals

Using the controller, set:

timer.lps.profile.interval = 10

Observe in EVE logs that the local profile is now fetched every 10 seconds
instead of every 60 seconds. Restore the default when done.

Changelog notes

  • Low-latency LPS configuration updates. EVE can now receive near-instant
    notifications from the Local Profile Server when new configuration is ready,
    reducing the delay from up to 60 seconds to under 2 seconds. LPS
    applications must implement the optional GET /api/v1/signal streaming
    endpoint to take advantage of this; existing LPS deployments without the
    endpoint continue to work exactly as before.

  • Richer network status in LPS. EVE now reports the live kernel-observed
    state of every network port to LPS on each network POST: link up/down,
    assigned IP addresses (with subnet mask), default gateways, DNS and NTP
    servers, and MTU. LPS applications can display this information to operators
    without needing a separate management channel.

  • Per-port modification permission flag. LPS now knows which network ports
    it is allowed to reconfigure (as provisioned by the controller), so it can
    give operators clear feedback instead of a generic error when they attempt to
    modify a controller-managed port.

  • Reactive LPS network updates. EVE now pushes an updated network POST to
    LPS immediately when the device's network configuration or link state changes,
    rather than waiting for the next periodic tick.

  • Tunable LPS polling intervals. All six LPS polling intervals (local
    profile, radio, app info, device info, network, app boot info) can now be
    adjusted via controller config properties. The defaults are unchanged.

PR Backports

- 16.0-stable: Yes. Even though this is strictly speaking a new feature, we must make an exception here...
- 14.5-stable: No 
- 13.4-stable: No

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR
  • I've checked the boxes above, or I've provided a good reason why I didn't check them.

milan-zededa and others added 6 commits May 5, 2026 18:44
See: lf-edge/eve-api#144

Signed-off-by: Milan Lenco <milan@zededa.com>
Mirror the controller-provisioned SystemAdapter.allow_local_modifications
flag through dpcToProto so LPS can distinguish ports that accept local
configuration from those managed exclusively by the controller, without
having to infer it by trial-and-error via error_message.

Signed-off-by: Milan Lenco <milan@zededa.com>
Open a long-lived GET /api/v1/signal stream to the Local Profile Server
and, upon each incoming Signal message, immediately trigger the listed
endpoints' pollers -- bypassing the ~1-minute periodic cadence while
preserving it as the correctness fallback. This removes the minute-scale
delay that operators previously saw between entering a config change in
the LPS UI and EVE picking it up.

The Signal handler runs as an additional LocalCmdAgent goroutine.
Connection open is guarded by the existing startTask/runInterruptible/
endTask pattern used by the other pollers; the long body read runs
without the task lock so it cannot block pause(). On URL change,
UpdateLpsConfig cancels the in-flight stream and wakes the goroutine,
which reconnects against the current LPS address. Dispatches are
rate-limited (1 signal / 3s, burst 3). LPS 404 throttles reconnect
attempts to once per hour. No watchdog is registered -- a legitimately
long blocking Read must not trigger a device reboot.

A new controllerconn.Client.OpenLocalStream helper provides the
streaming HTTP client (reuses DialerWithResolverCache, adds TCP
keepalive for dead-peer detection, disables HTTP keep-alive for clean
connection teardown, and drops the per-request timeout that SendLocal
applies). The existing triggerProfileGET is exported as
TriggerProfileGET for symmetry with the other Trigger*POST helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Milan Lenco <milan@zededa.com>
Fire an immediate LPS Network-endpoint POST whenever a substantive change
is detected in either the device port config list (handleDPCLImpl) or the
device network status (handleDNSImpl), so the local operator sees the
effect of a network config change right away instead of waiting for the
next periodic post. A burst of updates during, e.g., DPC verification is
naturally coalesced by the networkTicker's size-1 buffered channel:
TickNow is a non-blocking send, so excess kicks arriving while a POST is
already in flight or pending are dropped.

Signed-off-by: Milan Lenco <milan@zededa.com>
Populate the new NetworkInfo.port_status field from DeviceNetworkStatus
when EVE posts to the LPS Network endpoint, giving the local operator
a view of the kernel-observed state of each network port (link up/down,
MAC address, currently-assigned IP addresses, active default routers,
effective DNS servers and search domain, NTP servers in use, and
applied MTU) alongside the existing declarative config views.

CIDR-formatted IP addresses require the interface's subnet mask, which
DeviceNetworkStatus previously did not carry in AddrInfoList. Extend
types.AddrInfo with a Mask field and populate it from the netlink
address entry in DpcManager.updateDNS.

Signed-off-by: Milan Lenco <milan@zededa.com>
Add timer.lps.<task>.interval config properties (profile, radio, appinfo,
devinfo, network, appbootinfo) with defaults matching current hard-coded
values, min 3s, max 1h. LocalCmdAgent initializes globalConfig with
DefaultConfigItemValueMap() so all tasks use the correct default before
the first real config arrives. Interval changes take effect immediately
on the next UpdateGlobalConfig call without resetting throttle state.

Signed-off-by: Milan Lenco <milan@zededa.com>
@milan-zededa milan-zededa requested a review from uncleDecart May 5, 2026 17:17
@milan-zededa milan-zededa requested a review from eriknordmark as a code owner May 5, 2026 17:17
@milan-zededa milan-zededa added enhancement New feature or request stable Should be backported to stable release(s) labels May 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 4.28016% with 246 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.07%. Comparing base (2281599) to head (1e5b138).
⚠️ Report is 644 commits behind head on master.

Files with missing lines Patch % Lines
pkg/pillar/localcommand/signal.go 0.00% 146 Missing ⚠️
pkg/pillar/controllerconn/send.go 0.00% 38 Missing ⚠️
pkg/pillar/localcommand/network.go 0.00% 28 Missing ⚠️
pkg/pillar/localcommand/agent.go 0.00% 12 Missing ⚠️
pkg/pillar/cmd/zedagent/zedagent.go 0.00% 7 Missing ⚠️
pkg/pillar/localcommand/profile.go 0.00% 5 Missing ⚠️
pkg/pillar/localcommand/appbootinfo.go 0.00% 2 Missing ⚠️
pkg/pillar/localcommand/appinfo.go 0.00% 2 Missing ⚠️
pkg/pillar/localcommand/devinfo.go 0.00% 2 Missing ⚠️
pkg/pillar/localcommand/radio.go 0.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5904      +/-   ##
==========================================
- Coverage   19.52%   17.07%   -2.46%     
==========================================
  Files          19      475     +456     
  Lines        3021    85923   +82902     
==========================================
+ Hits          590    14671   +14081     
- Misses       2310    69734   +67424     
- Partials      121     1518    +1397     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run tests

But some yetus annotations to fix (however, yetus shows as passing which is new to me!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request stable Should be backported to stable release(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants