This is a minimal repro case demonstrating that --experimental_async_execution causes linux-sandbox actions that take longer than 30 seconds to be killed with SIGKILL.
With --experimental_async_execution enabled Bazel runs actions on virtual threads. A virtual thread forks from its carrier thread. The carrier thread is detached while the virtual thread waits for the forked process to complete. If there isn't other work for the carrier thread to do, then it is idle and can be killed due to inactivity. The default virtual thread scheduler in JDK 25 uses a ForkJoinPool with a 30 second TTL.
The parent carrier thread being killed causes the linux-sandbox to die because the sandbox self destructs when its parent dies via prctl(PR_SET_PDEATHSIG, SIGKILL).
These two things combined can cause Bazel actions to fail erroneously when they use the linux-sandbox and take longer than 30 seconds.
The linux-sandbox needs to be used as the strategy for this bug to occur. That means the prereqs for linux-sandbox need to be met. If you try to repro the bug by building bazel build //:slow_action and get an error like this:
ERROR: 'linux-sandbox' was requested for explicit default strategies but no strategy with that identifier was registered. Valid values are: [dynamic_worker, processwrapper-sandbox, standalone, dynamic, remote, worker, sandboxed, local]
Then you likely need to temporarily disable the apparmor restriction that prevents unprivileged user namespaces:
# Check if unprivileged user namespaces are restrcited
cat /proc/sys/kernel/apparmor_restrict_unprivileged_userns
# If 1, temporarily disable:
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0
# Restart the bazel server afterwards, so it picks up on the change
bazel shutdown-
Build the target:
bazel build //:slow_action
The build fails after approximately 30 seconds. The action shows it was
(Killed):ERROR: /home/<redacted>/opensource/bazel_virtual_thread_repro/BUILD.bazel:1:8: Executing genrule //:slow_action failed: (Killed): bash failed: error executing Genrule command (from target //:slow_action) /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; sleep 45 && echo done > bazel-out/k8-fastbuild/bin/output.txt' Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging Target //:slow_action failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 32.774s, Critical Path: 30.03s INFO: 2 processes: 2 internal. ERROR: Build did NOT complete successfully -
Verify it passes without async execution. It will take approximately 45 seconds.
bazel build //:slow_action --noexperimental_async_execution
-
You can also verify it passes when
processwrapper-sandboxis used while--experimental_async_executionis still enabled. You may need tobazel cleanif you successfully built the action in step 2.bazel build //:slow_action --spawn_strategy=processwrapper-sandbox