fix: latch worker flag when torch._dynamo.reset() fails to prevent stale-cache recompile by livepeer-tessa · Pull Request #671 · daydreamlive/scope

livepeer-tessa · 2026-03-11T18:51:07Z

Addresses CodeRabbit's review comment on #670.

Problem

If torch._dynamo.reset() raises during _unload_pipeline_by_id_unsafe, the exception was silently swallowed and pipeline_unloaded was published unconditionally. Stale Dynamo/FP8 compile caches remained live in the worker process, so the next krea-realtime-video load would attempt torch.compile against those caches — re-entering the warmup crash from the FP8→Krea conflict that #669 was meant to fix.

Fix

Introduces self._dynamo_reset_failed: bool = False on PipelineManager. When torch._dynamo.reset() raises:

The flag is latched True (persists for the worker process lifetime)
The unload still completes and pipeline_unloaded is published (memory is freed)
Any subsequent krea-realtime-video load sees the flag and forces compile=False, with a warning log to restart the worker

This is safer than failing the unload entirely (which would strand the pipeline in a broken state) while still preventing the stale-cache recompile crash.

Changes

__init__: adds self._dynamo_reset_failed = False
_unload_pipeline_by_id_unsafe: latches flag on reset failure
_load_pipeline_implementation (krea branch): checks flag before deciding compile=

Summary by CodeRabbit

Bug Fixes
- Prevented FP8-related compiled kernels from running when incompatible; now skipped with clear warnings.
- Improved cache reset on pipeline unload to avoid stale FP8 state leaking into future runs.
- Added safeguard to disable compilation after a failed cache reset to prevent repeated failures.
Improvements
- Smarter compilation enablement based on GPU capability and prior reset state, with user-facing guidance to restart if needed.
- More stable attention/KV behavior when compilation is turned off; warmup of compiled kernels is skipped with informational messaging.

Float8DynamicActivationFloat8WeightConfig is not compatible with torch.compile(fullgraph=False). During warmup on H100 (where compile=True), AOT autograd's gen_alias_from_base calls aten.as_strided on Float8Tensor outputs, which is not implemented in torchao: NotImplementedError: Float8Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.as_strided', overload='default')> The crash manifests specifically after longlive (also FP8) because torch._dynamo's compile cache is never reset between pipeline switches, allowing longlive's Float8 dispatch state to persist and influence Krea's subsequent compile attempt. Two fixes: 1. krea_realtime_video/pipeline.py: when FP8 quantization is active, skip block.compile() — the two optimizations are currently mutually exclusive with fullgraph=False. FP8 alone still provides meaningful memory/compute savings on H100 without compile. 2. pipeline_manager.py: call torch._dynamo.reset() on every pipeline unload to clear stale compiled graphs and Float8 dispatch state, preventing cross-pipeline cache pollution. Fixes #669 Signed-off-by: livepeer-robot <robot@livepeer.org>

…ale-cache recompile If torch._dynamo.reset() raises during pipeline unload, stale Dynamo/FP8 compile caches remain active in the worker process. Previously the code swallowed the exception and published pipeline_unloaded unconditionally, leaving the next krea-realtime-video load free to torch.compile against those stale caches — re-entering the warmup crash from the FP8→Krea conflict. Fix: set self._dynamo_reset_failed = True on reset failure. The Krea load path now checks this flag and forces compile=False for the lifetime of the worker, with a clear log warning to restart the process to re-enable compilation. Addresses CodeRabbit review comment on PR #670. Signed-off-by: livepeer-robot <robot@livepeer.org>

coderabbitai · 2026-03-11T18:51:22Z

📝 Walkthrough

Walkthrough

Pipeline now avoids compiling attention blocks and skips warmup when FP8 quantization is active; KV-cache attention bias is set based on compile flag. Pipeline manager tracks torch._dynamo.reset() failures on unload, disables compilation for subsequent loads (with logs) until restart, and only enables compile on Hopper-capable GPUs.

Changes

Cohort / File(s)	Summary
Realtime video pipeline `src/scope/core/pipelines/krea_realtime_video/pipeline.py`	Added FP8-aware gating: when FP8 quantization is detected, log a warning and skip attention block compilation and warmup. Warmup computations (local_attn_size, num_frame_per_block, warmup_runs) and execution only run when `compile=True`. Initialize `kv_cache_attention_bias` to `DEFAULT_KV_CACHE_ATTENTION_BIAS` if compiling, otherwise `KV_CACHE_ATTENTION_BIAS_DISABLED`.
Pipeline manager & dynamo handling `src/scope/server/pipeline_manager.py`	Introduce private `_dynamo_reset_failed` flag; on unload attempt `torch._dynamo.reset()` and set flag on failure. Loading computes `_should_compile` only if GPU is Hopper-capable and `_dynamo_reset_failed` is false; propagate compile decision into `KreaRealtimeVideoPipeline` construction and log guidance when compilation is disabled due to prior reset failure.

Sequence Diagram

sequenceDiagram
    participant User as User/Caller
    participant PM as PipelineManager
    participant Dynamo as torch._dynamo
    participant GPU as GPU
    participant Pipeline as KreaRealtimeVideoPipeline

    User->>PM: load_pipeline()
    activate PM

    rect rgba(100, 149, 237, 0.5)
    Note over PM,GPU: Determine compilation capability
    PM->>GPU: Check Hopper capability
    GPU-->>PM: Hopper? (yes/no)
    PM->>PM: Check _dynamo_reset_failed flag
    end

    alt _dynamo_reset_failed OR not Hopper
        PM->>PM: _should_compile = False
        PM->>PM: Log compilation disabled (warning)
    else
        PM->>PM: _should_compile = True
    end

    rect rgba(152, 251, 152, 0.5)
    Note over PM,Pipeline: Construct pipeline with compile decision
    PM->>Pipeline: KreaRealtimeVideoPipeline(compile=_should_compile)
    activate Pipeline
    Pipeline->>Pipeline: Initialize public state (KV bias depends on compile)
    Pipeline->>Pipeline: Check FP8 quantization config
    alt FP8 enabled
        Pipeline->>Pipeline: Log warning about FP8 + compile incompatibility
        Pipeline->>Pipeline: Skip attention block compilation & warmup
    else
        Pipeline->>Pipeline: Compile attention blocks
        Pipeline->>Pipeline: Perform warmup priming compiled kernels
    end
    Pipeline-->>PM: Pipeline ready
    deactivate Pipeline
    end

    PM-->>User: Pipeline loaded
    deactivate PM

    User->>PM: unload_pipeline()
    activate PM
    rect rgba(240, 128, 128, 0.5)
    Note over PM,Dynamo: Reset torch._dynamo cache to avoid FP8 cache leakage
    PM->>Dynamo: reset()
    alt Reset succeeds
        Dynamo-->>PM: Success
        PM->>PM: Clear _dynamo_reset_failed flag
    else Reset fails
        Dynamo-->>PM: Error
        PM->>PM: Set _dynamo_reset_failed flag
        PM->>PM: Log warning advising restart to re-enable compilation
    end
    end
    PM-->>User: Pipeline unloaded
    deactivate PM

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through kernels, sniffed the FP8 breeze,
Told the compiler "pause" when floats did tease.
I set the cache bias, left a tiny flag,
If Dynamo trips, I wag my debugging bag. 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly describes the primary change: latching a worker flag when torch._dynamo.reset() fails to prevent stale cache recompilation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/dynamo-reset-failure-guard

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/scope/server/pipeline_manager.py (1)

699-713: Narrow the warning to the path that actually consumes the latch.

Lines 709-710 say this forces compile=False for all later pipeline loads, but _dynamo_reset_failed is only checked in the krea-realtime-video branch at Lines 982-999. Tightening the message will keep operators from assuming other loaders are protected.

✏️ Suggested wording

-                "forcing compile=False for all subsequent pipeline loads."
+                "forcing compile=False for subsequent krea-realtime-video loads."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scope/server/pipeline_manager.py` around lines 699 - 713, The warning
message after catching exception from torch._dynamo.reset() is too broad; update
the logged text to indicate that only the krea-realtime-video loader currently
consumes _dynamo_reset_failed and will force compile=False for subsequent
pipeline loads handled by that branch (the logic that checks
_dynamo_reset_failed in the krea-realtime-video branch). Edit the message
emitted in the except block where _dynamo_reset_failed is set so it explicitly
mentions the krea-realtime-video path and that other loaders may not be
affected, and keep the existing assignment to self._dynamo_reset_failed so the
existing krea-realtime-video check continues to work.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/server/pipeline_manager.py`:
- Around line 982-999: The compile flag passed into KreaRealtimeVideoPipeline is
not propagated to the warmup path, so even when compile=False the warmup loop
still sets kv_cache_attention_bias to DEFAULT_KV_CACHE_ATTENTION_BIAS and
triggers torch.compile; update the pipeline code in KreaRealtimeVideoPipeline so
the warmup routine checks the instance's compile flag (or accept a compile
parameter) and when compile is False: (a) avoid assigning
DEFAULT_KV_CACHE_ATTENTION_BIAS (use None or a non-compiling sentinel) and (b)
skip calling block.compile(fullgraph=False) inside the warmup loop; ensure you
reference and use the pipeline attribute (compile) and symbols
kv_cache_attention_bias, DEFAULT_KV_CACHE_ATTENTION_BIAS, and the warmup loop
where block.compile is invoked.

---

Nitpick comments:
In `@src/scope/server/pipeline_manager.py`:
- Around line 699-713: The warning message after catching exception from
torch._dynamo.reset() is too broad; update the logged text to indicate that only
the krea-realtime-video loader currently consumes _dynamo_reset_failed and will
force compile=False for subsequent pipeline loads handled by that branch (the
logic that checks _dynamo_reset_failed in the krea-realtime-video branch). Edit
the message emitted in the except block where _dynamo_reset_failed is set so it
explicitly mentions the krea-realtime-video path and that other loaders may not
be affected, and keep the existing assignment to self._dynamo_reset_failed so
the existing krea-realtime-video check continues to work.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 052a352d-7d44-43da-82f5-f7f5d0673f5e

📥 Commits

Reviewing files that changed from the base of the PR and between 5f6ee61 and 2797c1c.

📒 Files selected for processing (2)

src/scope/core/pipelines/krea_realtime_video/pipeline.py
src/scope/server/pipeline_manager.py

src/scope/server/pipeline_manager.py

…ompile=False When compile=False, kv_cache_attention_bias was still being set to DEFAULT_KV_CACHE_ATTENTION_BIAS (0.3), which causes the warmup loop to enter the flex_attention code path and trigger torch._dynamo tracing even though no block.compile() call was ever made. This meant the _dynamo_reset_failed guard in pipeline_manager.py had no effect on the warmup-induced recompilation. Fix: - Import KV_CACHE_ATTENTION_BIAS_DISABLED (1.0) from causal_model and use it as the initial kv_cache_attention_bias when compile=False. This sentinel makes causal_model.py take the standard attention branch and skip the flex_attention/torch.compile path entirely. - Guard the warmup loop behind 'if compile:' — warmup exists solely to prime the compiled flex_attention kernel, so it is a no-op (and harmful) when compilation is disabled. Log a message when skipped for observability. Addresses CodeRabbit review comment on PR #671. Signed-off-by: livepeer-robot <robot@livepeer.org>

github-actions · 2026-03-11T19:04:31Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-671--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-671--preview/ws`
Commit	`0c360be`

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-671--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/core/pipelines/krea_realtime_video/pipeline.py`:
- Around line 233-235: The warmup_runs calculation currently uses floor division
and should use ceiling to ensure warmup covers the full number of blocks;
replace the expression for warmup_runs (which references local_attn_size and
num_frame_per_block) with ceil(local_attn_size / num_frame_per_block) + 1 and
add the necessary import (e.g., import math or from math import ceil) so the
warmup count rounds up when values aren’t evenly divisible.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 636a55cb-9cf6-4300-8d23-8ab320a824f5

📥 Commits

Reviewing files that changed from the base of the PR and between 2797c1c and 3ee5402.

📒 Files selected for processing (1)

src/scope/core/pipelines/krea_realtime_video/pipeline.py

src/scope/core/pipelines/krea_realtime_video/pipeline.py

The comment at line 230 already specifies ceil(local_attn_size / num_frame_per_block) + 1, but the implementation was using floor division (//). When local_attn_size is not evenly divisible by num_frame_per_block, this meant warmup stopped one iteration early, leaving the cache short of the steady-state shape and triggering a recompile on the first live request. Replace with the ceiling equivalent: (a + b - 1) // b to avoid importing math. Fixes coderabbitai suggestion on PR #671. Signed-off-by: livepeer-robot <robot@livepeer.org>

livepeer-tessa · 2026-03-11T19:07:29Z

Good catch @coderabbitai — the comment already specified ceil but the implementation used floor division. Fixed with the integer ceiling equivalent (a + b - 1) // b (no extra import needed). Pushed in 0c360be.

github-actions · 2026-03-11T19:07:49Z

✅ E2E Tests passed


Status	passed
fal App	`daydream/scope-pr-671--preview`
Run	View logs

Test Artifacts

Check the workflow run for screenshots.

coderabbitai · 2026-03-11T19:12:00Z

Tip

For best results, initiate chat on the files or code changes.

@livepeer-tessa: Thanks for the quick fix! The integer ceiling formula (a + b - 1) // b is exactly right — it avoids any floating-point conversion and keeps the logic consistent with the ceil comment. Looks good! 🎉

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/scope/core/pipelines/krea_realtime_video/pipeline.py (1)
1-1: ⚠️ Potential issue | 🟡 Minor

Fix Ruff formatting.

The pipeline indicates a formatting violation. Run ruff format on this file to resolve.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scope/core/pipelines/krea_realtime_video/pipeline.py` at line 1, Run the
code formatter (ruff format) on this module to fix the formatting violation;
specifically ensure the top-level import line ("import logging") and the rest of
pipeline.py follow ruff/black formatting rules (correct import/newline spacing,
trailing newline, and any minor whitespace issues) and then re-run ruff to
confirm no remaining violations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scope/core/pipelines/krea_realtime_video/pipeline.py`:
- Around line 141-152: The current logic logs skipping block.compile() when FP8
is active but leaves the `compile` flag true, causing `initial_kv_bias` and the
warmup path to still run and trigger flex_attention tracing; update the flow so
you track whether compilation actually happened (e.g., set a new boolean like
`compiled` or flip `compile` to False after skipping) immediately after the
Float8 check where `block.compile()` is skipped, and then use that actual
compilation indicator in the subsequent logic that sets `initial_kv_bias` and
controls the warmup/flex_attention branch (the code around `block.compile()`,
`initial_kv_bias`, and the warmup lines) so warmup does not run when compilation
was skipped due to FP8.

---

Outside diff comments:
In `@src/scope/core/pipelines/krea_realtime_video/pipeline.py`:
- Line 1: Run the code formatter (ruff format) on this module to fix the
formatting violation; specifically ensure the top-level import line ("import
logging") and the rest of pipeline.py follow ruff/black formatting rules
(correct import/newline spacing, trailing newline, and any minor whitespace
issues) and then re-run ruff to confirm no remaining violations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7efd0c2d-363f-41b4-9d07-bf3dd528cc1d

📥 Commits

Reviewing files that changed from the base of the PR and between 3ee5402 and 0c360be.

📒 Files selected for processing (1)

src/scope/core/pipelines/krea_realtime_video/pipeline.py

coderabbitai · 2026-03-11T19:14:23Z

src/scope/core/pipelines/krea_realtime_video/pipeline.py

+            if compile:
+                # Float8DynamicActivationFloat8WeightConfig is incompatible with
+                # torch.compile(fullgraph=False): AOT autograd's gen_alias_from_base
+                # calls aten.as_strided on Float8Tensor outputs, which is not
+                # implemented. Skip block compilation when FP8 is active.
+                # See: https://github.com/daydreamlive/scope/issues/669
+                logger.warning(
+                    "Skipping torch.compile for attention blocks: "
+                    "Float8DynamicActivationFloat8WeightConfig is not compatible "
+                    "with fullgraph=False compilation (aten.as_strided unsupported "
+                    "on Float8Tensor). FP8 quantization is still active."
+                )


⚠️ Potential issue | 🟠 Major

FP8 + compile=True still triggers flex_attention warmup.

When FP8 quantization is active and compile=True, the code:

Logs the warning and skips block.compile() (correct)

But compile remains True, so initial_kv_bias is set to 0.3 (line 212)

Warmup runs (line 232) with bias < 1.0, which per lines 223-226 "would otherwise enter the flex_attention code path... and trigger torch._dynamo tracing"

This defeats the purpose of skipping compilation for FP8. Consider tracking whether compilation actually occurred:

Proposed fix

+ # Track whether block compilation actually happens (FP8 is incompatible) + did_compile = False + if quantization == Quantization.FP8_E4M3FN: # Cast before optional quantization generator = generator.to(dtype=dtype) @@ -140,6 +143,7 @@ else: generator = generator.to(device=device, dtype=dtype) if compile: # Only compile the attention blocks for block in generator.model.blocks: # Disable fullgraph right now due to issues with RoPE block.compile(fullgraph=False) + did_compile = True # ... later ... initial_kv_bias = ( - DEFAULT_KV_CACHE_ATTENTION_BIAS if compile else KV_CACHE_ATTENTION_BIAS_DISABLED + DEFAULT_KV_CACHE_ATTENTION_BIAS if did_compile else KV_CACHE_ATTENTION_BIAS_DISABLED ) # ... and ... - if compile: + if did_compile: local_attn_size = getattr(model_config, "local_attn_size", 6)

Also applies to: 211-214, 232-250

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scope/core/pipelines/krea_realtime_video/pipeline.py` around lines 141 - 152, The current logic logs skipping block.compile() when FP8 is active but leaves the `compile` flag true, causing `initial_kv_bias` and the warmup path to still run and trigger flex_attention tracing; update the flow so you track whether compilation actually happened (e.g., set a new boolean like `compiled` or flip `compile` to False after skipping) immediately after the Float8 check where `block.compile()` is skipped, and then use that actual compilation indicator in the subsequent logic that sets `initial_kv_bias` and controls the warmup/flex_attention branch (the code around `block.compile()`, `initial_kv_bias`, and the warmup lines) so warmup does not run when compilation was skipped due to FP8.

livepeer-robot added 2 commits March 11, 2026 18:37

livepeer-tessa mentioned this pull request Mar 11, 2026

fix: krea FP8 + torch.compile incompatibility on H100 (#669) #670

Open

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

src/scope/server/pipeline_manager.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

src/scope/core/pipelines/krea_realtime_video/pipeline.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

Conversation

livepeer-tessa commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 fal.ai Preview Deployment

Testing

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

livepeer-tessa commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ E2E Tests passed

Test Artifacts

Uh oh!

coderabbitai bot commented Mar 11, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

livepeer-tessa commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

github-actions bot commented Mar 11, 2026 •

edited

Loading

github-actions bot commented Mar 11, 2026 •

edited

Loading