fix: show elapsed time during cloud cold start and increase retries by livepeer-tessa · Pull Request #708 · daydreamlive/scope

livepeer-tessa · 2026-03-17T16:28:40Z

Fixes #704

Problem

When enabling remote inference, the UI shows "Starting cloud server..." for up to 3 minutes with zero feedback. Users have no idea if the connection is working or stuck. After all 3 retry attempts time out, they get a generic error and have to manually retry.

Changes

Progress feedback during cold start: The connect_stage now updates every second with elapsed time (e.g. Starting cloud server... (45s)) so users can see the connection is alive and roughly how long it's been waiting
More retries: Increased max_attempts from 3 → 5, giving cold-starting cloud runners more chances to become available before reporting failure
Progressive retry delays: Instead of a fixed 5s between retries, delays now increase (5s, 10s, 15s, 20s) to avoid hammering the endpoint during prolonged unavailability

Testing

Enable remote inference with a freshly cold cloud runner
Observe that the status text updates each second (e.g. "Starting cloud server... (30s)")
If timeout occurs, verify up to 5 retry attempts are made with increasing delays

…derflow The WAN VAE encoder contains a 3×3 spatial convolution kernel. When the input chunk has spatial dimensions < 3 on either axis the forward pass raises: RuntimeError: Calculated padded input size per channel: (2 x 513). Kernel size: (3 x 3). Kernel size can't be greater than actual input size Observed in prod logs (2026-03-15, 10:48–10:59 UTC) on krea-realtime-video pipeline, fal.ai job 5193400c-da0f-4eef-8bdd-dd0fdd26c1db: 2 372 errors over 11 minutes (~4 errors/second) from an input with height=2 pixels. Fix: in _encode_with_conditioning, detect when height or width < 3 and pad to the minimum safe size using F.pad. The corresponding masks tensor is also padded to keep shapes consistent. block_state.height/width are updated so the downstream resolution check still passes. A WARNING is emitted so the unusual input remains visible in logs without a crash. This is the spatial analogue of the 3×1×1 temporal kernel guard (issue #673, PR #674). Fixes #557 Signed-off-by: livepeer-robot <robot@livepeer.org>

When remote inference cold-starts, the background connect task can time out waiting for the 'ready' signal even though the cloud runner is already starting up. This left users with a failed connection and no automatic recovery — they had to manually retry. Add retry logic to connect_background (up to 3 attempts, 5 s delay): - On each failure, check if the error is transient (timeout, network, connection refused, reset). If so, wait and retry. - Non-transient errors (auth, config, bad app_id) bail immediately. - The connect_stage field is updated during the retry delay so the UI can show "Retrying connection (attempt N/3)..." instead of going silent. Fixes #704 — users no longer need to manually retry when the cloud runner cold-starts and the first connection attempt times out. Signed-off-by: livepeer-robot <robot@livepeer.org>

When connecting to remote inference, the UI showed 'Starting cloud server...' for up to 3 minutes with no progress feedback. Users had no way to tell if the connection was alive or stuck. Changes: - Update _connect_stage with elapsed seconds during the 'ready' wait (e.g. 'Starting cloud server... (45s)') so users can see progress - Increase max connection attempts from 3 → 5 to give cold-starting cloud runners more chances to become available - Use progressive retry delays (5s, 10s, 15s, 20s) instead of a fixed 5s so consecutive timeouts space out without overwhelming the endpoint Fixes #704 Signed-off-by: livepeer-robot <robot@livepeer.org>

coderabbitai · 2026-03-17T16:35:02Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 513ec5ba-ec63-4772-bcc0-db1fb13b1685

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/cloud-connection-progress-feedback

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

github-actions · 2026-03-17T16:51:24Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-708--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-708--preview/ws`
Commit	`eaa3a12`

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-708--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

github-actions · 2026-03-17T16:55:19Z

✅ E2E Tests passed


Status	passed
fal App	`daydream/scope-pr-708--preview`
Run	View logs

Test Artifacts

Check the workflow run for screenshots.

livepeer-robot added 3 commits March 15, 2026 18:19

livepeer-tessa mentioned this pull request Mar 17, 2026

Cannot connect to Scope remote inference #704

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: show elapsed time during cloud cold start and increase retries#708

fix: show elapsed time during cloud cold start and increase retries#708
livepeer-tessa wants to merge 3 commits intomainfrom
fix/cloud-connection-progress-feedback

livepeer-tessa commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Review skipped

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

livepeer-tessa commented Mar 17, 2026

Problem

Changes

Testing

Uh oh!

coderabbitai bot commented Mar 17, 2026

Review skipped

Uh oh!

github-actions bot commented Mar 17, 2026

🚀 fal.ai Preview Deployment

Testing

Uh oh!

github-actions bot commented Mar 17, 2026

✅ E2E Tests passed

Test Artifacts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant