Add reuse-container mode#303
Conversation
Add a persistent container feature. When `reuse-container: true` is set, the plugin keeps a container alive across job steps instead of creating and destroying one each time. Key behaviors: - Derives a stable container name from the image name and agent spawn index (or uses an explicit `reuse-container-name` override). - On each job, checks for an existing running container: if the image digest matches, attaches via `docker exec`; if the digest differs, prints a warning banner, removes the stale container, and creates a fresh one. - New containers are started detached (`docker run -d ... sleep infinity`) with `--rm` suppressed, then the job command runs via `docker exec`. - Environment variables are stripped from the creation `docker run` and injected per-exec only, preventing secrets from leaking between jobs that share the container. - The `pre-exit` hook skips container cleanup when reuse-container is enabled so the container persists for subsequent jobs. Files changed: - plugin.yml: new `reuse-container` and `reuse-container-name` opts - lib/shared.bash: `get_reuse_container_name`, `get_container_image_id`, `get_image_id` helpers - commands/run.sh: reuse-container branch with exec-arg extraction, digest validation, and detached container creation - hooks/pre-exit: skip cleanup when reuse-container is set - tests/reuse-container.bats: 11 bats tests covering creation, reuse, digest mismatch, stopped container, custom name, spawn index, env passthrough, env stripping, pre-exit, --rm absence, and exit code propagation
toote
left a comment
There was a problem hiding this comment.
Thanks for the great PR @rsennewald! I think it is a very interesting feature but I have some questions about how it would interact with other parts of the plugin that assume you are running in docker to ensure an ephemeral and isolated execution environment (specially labels and, volume mounts) that is in a known state - because it will be created from an image every time.
Also, I think that the code flow should be better so that we avoid having to duplicate the code to run the container or calculate the right arguments for the commands.
More than happy to discuss these items and looking forward to your opinions!
| # Build exec_args by extracting only the flags that docker exec supports | ||
| # from the run_flags snapshot (tty, interactive, env, workdir, user). | ||
| exec_args=() | ||
| i=0 | ||
| while [[ $i -lt ${#run_flags[@]} ]]; do | ||
| case "${run_flags[$i]}" in | ||
| -t|-i) | ||
| exec_args+=("${run_flags[$i]}") | ||
| ;; | ||
| --env|--env-file|--workdir|-u) | ||
| exec_args+=("${run_flags[$i]}" "${run_flags[$((i+1))]}") | ||
| i=$((i+1)) | ||
| ;; | ||
| esac | ||
| i=$((i+1)) | ||
| done |
There was a problem hiding this comment.
I think it would be more readable to create a separate argument array for the exec codepath or, even better, an array for the options available on both exec and run and another for the ones exclusive to run
| exec_cmd=() | ||
| if [[ ${#shell[@]} -gt 0 ]]; then | ||
| for shell_arg in "${shell[@]}"; do | ||
| exec_cmd+=("$shell_arg") | ||
| done | ||
| fi | ||
| if [[ -n "${BUILDKITE_COMMAND}" ]]; then | ||
| if is_windows; then | ||
| windows_multi_command=${BUILDKITE_COMMAND//$'\n'/ && } | ||
| exec_cmd+=("${windows_multi_command}") | ||
| else | ||
| exec_cmd+=("${BUILDKITE_COMMAND}") | ||
| fi | ||
| elif [[ ${#command[@]} -gt 0 ]]; then | ||
| for command_arg in "${command[@]}"; do | ||
| exec_cmd+=("$command_arg") | ||
| done | ||
| fi |
There was a problem hiding this comment.
same here: why duplicate this logic that is already done above for the run command? It would make more sense to save all this to a separate array that is the same for both run and exec
| # Disable -e outside of the subshell; since the subshell returning a failure | ||
| # would exit the parent shell (here) early. | ||
| set +e | ||
|
|
||
| # Prevent SIGTERM from killing this script. SIGTERM will still be passed to the Docker container, which can exit | ||
| # gracefully (or, if necessary, non-gracefully per the `--stop-timeout` flag passed above). | ||
| trap '' SIGTERM | ||
|
|
||
| # Don't convert paths on gitbash on windows, as that can mangle user paths and cmd options. | ||
| # See https://github.com/buildkite-plugins/docker-buildkite-plugin/issues/81 for more information. | ||
| # `trap` is used in this subshell for the same reason it is used above. | ||
| ( if is_windows ; then export MSYS_NO_PATHCONV=1; fi && trap '' SIGTERM && docker run "${args[@]}" ) | ||
|
|
||
| exit_code=$? | ||
|
|
||
| set -e | ||
|
|
||
| exit $exit_code |
There was a problem hiding this comment.
this whole block is common to both branches, I think it could be simplified with a better crafting (i.e. replacing with a better version) of the ${args[@]}
|
|
||
| # Returns a stable container name for the reuse-container feature. | ||
| # Uses the explicit override if set, otherwise derives from the image name | ||
| # and the agent's spawn index (to isolate containers per agent on a host). |
There was a problem hiding this comment.
this should also be isolated per pipeline, otherwise you could run into issues with a completely different repositories being mounted (if they use the same image to run)
| "--label" "com.buildkite.pipeline_name=${BUILDKITE_PIPELINE_NAME}" | ||
| "--label" "com.buildkite.pipeline_slug=${BUILDKITE_PIPELINE_SLUG}" | ||
| "--label" "com.buildkite.build_number=${BUILDKITE_BUILD_NUMBER}" | ||
| "--label" "com.buildkite.job_id=${BUILDKITE_JOB_ID}" |
There was a problem hiding this comment.
how would would you suggest to fix the fact that the created container will for ever have labels referencing jobs/builds/pipelines it may no longer be running for?
|
|
||
| ### `reuse-container` (optional, boolean) | ||
|
|
||
| When set to `true`, the plugin keeps a named container running across pipeline steps instead of creating and destroying one each time. On each step, the plugin checks for an existing container: if the image digest matches, the command runs via `docker exec`; if the digest differs, the old container is removed and a fresh one is created. Environment variables are injected per-exec only (not baked into the container) to prevent secrets from leaking between jobs. |
There was a problem hiding this comment.
you should add a note with a huge warning about turning on this option when running commands that have side-effects on the container or file system as those would be preserved between runs or pipelines
Add a persistent container feature. When
reuse-container: trueis set, the plugin keeps a container alive across job steps
instead of creating and destroying one each time.
Key behaviors:
spawn index (or uses an explicit
reuse-container-nameoverride).image digest matches, attaches via
docker exec; if the digestdiffers, prints a warning banner, removes the stale container,
and creates a fresh one.
docker run -d ... sleep infinity) with--rmsuppressed, then the job command runs viadocker exec.docker runand injected per-exec only, preventing secrets from leaking
between jobs that share the container.
pre-exithook skips container cleanup when reuse-containeris enabled so the container persists for subsequent jobs.
Files changed:
reuse-containerandreuse-container-nameoptsget_reuse_container_name,get_container_image_id,get_image_idhelpersdigest validation, and detached container creation
reuse, digest mismatch, stopped container, custom name, spawn
index, env passthrough, env stripping, pre-exit, --rm absence,
and exit code propagation