Skip to content

test: rebuild stagehand actor after registering OPENAI_API_KEY#1900

Merged
vdusek merged 2 commits into
masterfrom
fix/stagehand-e2e-rebuild-after-env-var
May 19, 2026
Merged

test: rebuild stagehand actor after registering OPENAI_API_KEY#1900
vdusek merged 2 commits into
masterfrom
fix/stagehand-e2e-rebuild-after-env-var

Conversation

@vdusek
Copy link
Copy Markdown
Collaborator

@vdusek vdusek commented May 17, 2026

Summary

The scheduled stagehand e2e test fails with ValueError: The OPENAI_API_KEY environment variable is not set. at actor runtime (example failing job).

Root cause: Apify bakes a version's environment variables into the build at build-creation time. The recently added code registers OPENAI_API_KEY on version 0.0 after apify push has already produced the build, so the env var never reaches the running container (Apify docs: "Once you start a build, you cannot change its environment variables. To use different variables, you must create a new build.").

Fix

After creating the env var on the version, trigger a fresh build via actor.build(version_number='0.0', wait_for_finish=600). actor.start() then runs the newer build, which has OPENAI_API_KEY baked in. The extra build only runs for the stagehand crawler type.

Apify bakes a version's env vars into the build at build-creation
time, so registering OPENAI_API_KEY after `apify push` had no effect
on the run. Trigger a fresh build between the env-var creation and
`actor.start()` so the secret is present at runtime.
@vdusek vdusek added t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels May 17, 2026
@vdusek vdusek self-assigned this May 17, 2026
@vdusek vdusek requested a review from janbuchar May 17, 2026 18:26
@github-actions github-actions Bot added this to the 140th sprint - Tooling team milestone May 17, 2026
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label May 17, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.86%. Comparing base (f263dc8) to head (d1d1753).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1900      +/-   ##
==========================================
- Coverage   92.92%   92.86%   -0.06%     
==========================================
  Files         167      167              
  Lines       11709    11709              
==========================================
- Hits        10881    10874       -7     
- Misses        828      835       +7     
Flag Coverage Δ
unit 92.86% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread tests/e2e/project_template/test_static_crawlers_templates.py Outdated
@vdusek vdusek merged commit e929d38 into master May 19, 2026
31 checks passed
@vdusek vdusek deleted the fix/stagehand-e2e-rebuild-after-env-var branch May 19, 2026 14:50
vdusek added a commit that referenced this pull request May 21, 2026
…h when sandbox is disabled (#1906)

## Summary

The scheduled e2e tests for the Stagehand template have failed for
several days. After PR #1900 fixed the `OPENAI_API_KEY` propagation, the
next layer of failure surfaced — every Stagehand variant in [today's
run](https://github.com/apify/crawlee-python/actions/runs/26136278735)
failed identically on the Apify platform with:

```
stagehand.InternalServerError: Error code: 500 - {'success': False, 'message': 'connect ECONNREFUSED 127.0.0.1:<port>'}
```

## Root cause

Reproduced locally inside the `apify/actor-python-playwright:3.13`
image: Stagehand's `BrowserLaunchOptions` accepts a `chromium_sandbox`
field, but the field is **not propagated** to the underlying Chromium
launch. When the actor runs as root (which it does on Apify), Chromium
silently refuses to start because the setuid sandbox can't initialise —
and Stagehand's local SEA server then hits `ECONNREFUSED` when it tries
to connect to the missing Chromium CDP port. The regular
`PlaywrightCrawler` is unaffected because Playwright's Python binding
*does* translate `chromium_sandbox=False` into `--no-sandbox`.

## Fix

In `StagehandBrowserPlugin`, when the sandbox is disabled
(`config.disable_browser_sandbox=True`, which Apify auto-sets), append
`'--no-sandbox'` to `browser_launch_options['args']` as an explicit
workaround. Verified end-to-end in the same actor base image — the
crawler now completes requests successfully.

The `apify push` hang from the same scheduled run is a separate flake
and is addressed in #1905.
vdusek added a commit that referenced this pull request May 22, 2026
)

## Summary

Two related scheduled e2e failures, both fixed by giving Apify builds
more time to complete.

### 1. Stagehand rebuild: `wait_for_finish=600` was silently clipped to
60s

The scheduled stagehand tests have been failing on every run since
[#1900](#1900) added the
post-`env_vars.create` rebuild
([example](https://github.com/apify/crawlee-python/actions/runs/26263684679/job/77302398839)):

```
apify_client.errors.ApifyApiError: The build has not finished or was not successful.
```

`ActorClientAsync.build()`'s `wait_for_finish` parameter is clipped
server-side to a max of 60s ("By default it is 0, the maximum value is
60"). A stagehand build (playwright + browser deps) does not finish in
60s, so `actor.build()` returned a still-`PROCESSING` build and
`actor.start(build=build_number)` was rejected.

**Fix:** After triggering the rebuild, poll client-side via
`client.build(<id>).wait_for_finish(wait_secs=900)` — which uses
long-poll requests and actually waits up to the requested duration —
then assert `status == 'SUCCEEDED'` before passing the build to
`actor.start()`.

### 2. `apify push` 120s timeout was too tight for heavier templates

The `poetry-curl-impersonate-adaptive-parsel` variant times out on every
rerun
([example](https://github.com/apify/crawlee-python/actions/runs/26263684679/job/77302398558)):

```
subprocess.TimeoutExpired: Command '['apify', 'push']' timed out after 120 seconds
```

`apify push` waits server-side for the Docker build to finish before
returning. The captured stderr shows a ~550 MB base image mid-download
when the timeout fires — the CLI isn't hung, the build is just
legitimately slower than 120s for heavier templates. Job total (522s ≈ 4
× 120s) confirms every rerun hits the same wall, so retries don't help.

**Fix:** Bump all three apify-cli `subprocess.run` timeouts (`login`,
`init`, `push`) from 120s to 600s. The 1800s `pytest-timeout` per test
still bounds a truly hung CLI; `@pytest.mark.flaky(reruns=3)` still
covers transient network/CLI flakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants