Conversation
|
same way we do conformance for runtimes, some of these contracts may be added to conformance: kubernetes-sigs/cri-tools#2046. I think critest is the right place for it. But it opens an interesting discussion on whether we will also want to do conformance with other things like CNI. NRI is different enough that I would see it as a reasonable conformance requirement |
CNI is not part of CRI, is an implementation detail of the runtimes and has several flaws that cause a lot of problem with current workloads ... during last kubecon during the OCI meeting we also discussed to replace it by a modular solution based on NRI ... I have a draft that once I have time I plan to finish and share, but I suggest to not include CNI as part of conformance of anything |
| - `Event_STOP_POD_SANDBOX` (1 << 1) for StopPodSandbox | ||
| - `Event_REMOVE_POD_SANDBOX` (1 << 2) for RemovePodSandbox | ||
|
|
||
| These events are notified via the StateChange RPC call. |
There was a problem hiding this comment.
This is a wire protocol detail which is going to change once #274 is merged, at which point we are going to have dedicated messages and RPC calls on the wire for every single pod and container life cycle event, and eventually remove StateChange after a deprecation period.
I think it would be better to refer here to the stub-offered API which (so far has) always had stronger stability and backward compatibility promise than the wire protocol. So could we perhaps rephrase this as "These events are delivered to plugins using the RunPodSandbox, StopPodSandbox and RemovePodSandbox event handlers." ?
| Plugins should handle errors gracefully and avoid leaving the pod or system in an inconsistent state. Error recovery strategies: | ||
|
|
||
| - **RunPodSandbox errors**: Problematic; may block pod creation depending on failure severity and runtime policy | ||
| - **StopPodSandbox errors**: May not prevent scenario termination depending on runtime policy |
There was a problem hiding this comment.
I think on the teardown path, actually both for pods and containers, we should not allow a plugin to try and prevent the operation with an error. If we agree, then we should clearly state here that, for StopPodSandbox and RemovePodSandbox, a plugin failing with an error will not prevent the operation from proceeding.
The current implementation has incorrect/inconstent behavior in this regard when multiple plugins are involved in the sense that for some of these (or corresponding container) teardown lifecycle events, a failure in a plugin will incorrectly prevent the event from being delivered to subsequent plugins, although it will not prevent the CRI-level operation from proceeding. There is a fix coming in for this, but it is waiting for #274 from get merged first (which is waiting for #277 to get merged first).
There was a problem hiding this comment.
@klihub can you please suggest the better wording for this?
I do not feel I'm able to translate that correctly to words :)
There was a problem hiding this comment.
ok, let me take a stab at this, I think now I got what you mean
There was a problem hiding this comment.
see new commit rephrasing it
|
@aojea Thank you, this looks great ! I only have a few comments. |
Define the contract for the PodSandbox hooks for the NRI plugins. The Sandbox hooks are based on the CRI-API RPCs , since the OCI runtime only specify the container lifecycle. Signed-off-by: Antonio Ojea <aojea@google.com>
609d764 to
3c820b0
Compare
|
squashed |
Define the contract for the PodSandbox hooks for the NRI plugins.
The Sandbox hooks are based on the CRI-API RPCs , since the OCI runtime only specify the container lifecycle.
/assign @samuelkarp @haircommander