refactor: adapt SDK Configuration for crawlee Configuration redesign#583
Open
B4nan wants to merge 27 commits into
Open
refactor: adapt SDK Configuration for crawlee Configuration redesign#583B4nan wants to merge 27 commits into
B4nan wants to merge 27 commits into
Conversation
Refactor the SDK Configuration class to match the new crawlee core
Configuration redesign:
- Subclass core Configuration using `protected static override fields`
- Direct property access (`config.token`) instead of `config.get('token')`
- Immutable: values set via constructor, no `set()` method
- Priority: constructor options > env vars > schema defaults
- isAtHome conditional defaults moved into field definitions
- Use serviceLocator instead of config.useStorageClient/getEventManager
- Import z, coerceNumber, coerceBoolean from @crawlee/core (no direct zod dep)
- Update all .get()/.set() call sites in actor.ts, charging.ts, etc.
- Update tests to use property access
Depends on crawlee PR: apify/crawlee#3474
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Import `z` from `zod` directly (no longer re-exported from crawlee core) - Define `coerceNumber` locally (no longer exported from crawlee core) - Add constructor override to accept `ApifyConfigurationInput` - Import `ConfigurationOptions` from SDK configuration instead of core - Fix test that mutated env vars after init (immutable config) Depends on: apify/crawlee#3080 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the destructuring of `storageDir` and spread of remaining `storageClientOptions` into the `ApifyClient` constructor so that arbitrary client options configured via `storageClientOptions` continue to reach the client. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gration - Reuse `coerceNumber` from `@crawlee/core` instead of defining a local copy; otherwise `FieldsOutput<typeof apifyConfigFields>` produces a structurally distinct (but equivalent) `availableMemoryRatio` type that breaks declaration-merging with crawlee's `Configuration`. - Drop the dead `storageClientOptions`/`storageDir` destructuring in `Actor.newClient()` — neither key exists in the redesigned Configuration; `options` already covers the override path. The remaining build errors (proxy/storage/event drift) are unrelated to the config redesign and tracked in separate follow-up PRs against the v4 branch.
This was referenced Apr 30, 2026
…ent cases Crawlee v4's Configuration resolves env vars eagerly at construction, so the existing 'Actor.newClient() reads environment variables correctly' test reads stale values once a prior test or import-time side effect has already created the singleton. Reset both before each case.
The SDK's `Configuration` keeps its own static singleton separate from crawlee's serviceLocator. Resetting only the locator wasn't enough — `Configuration.getGlobalConfig()` still handed back the stale cached SDK config (which was built before the test set `APIFY_TOKEN`).
- Reword "empty string maxTotalChargeUsd" assertion: under Option A the empty env var is now treated as unset, so `config.maxTotalChargeUsd` is `undefined` (charging manager still defaults to Infinity). - Actor.getInput tests now build a fresh Actor *after* setting the env vars they exercise — eager config resolution means a single module-scoped TestingActor would carry stale values.
Crawlee's Configuration uses crawleeConfigFields and only knows about `CRAWLEE_INPUT_KEY`. The SDK extension adds `ACTOR_INPUT_KEY` / `APIFY_INPUT_KEY` env-var aliases, which the test relies on. Importing Configuration from 'apify' makes `new Configuration()` inside buildActor() resolve those env vars correctly.
`@crawlee/linkedom@4.0.0-beta.49`'s `linkedom-crawler.js` imports `cheerio` without declaring it as a dependency. Locally this works when a parent directory has cheerio installed; CI's fresh install fails. Adding it directly here keeps tests green until the upstream package fixes the missing dep declaration.
`@crawlee/linkedom@4.0.0-beta.51` now declares cheerio as a direct dependency (apify/crawlee#3620), so the SDK no longer has to ship its own cheerio devDep to mask the missing declaration.
B4nan
added a commit
that referenced
this pull request
Apr 30, 2026
Crawlee v4's `EventManager` constructor now requires `EventManagerOptions` (just `persistStateIntervalMillis`), and the base class no longer carries a `config` field — the previous `override readonly config` pattern is no longer valid. - Drop the `override` and store `config` as own readonly property. - Forward `persistStateIntervalMillis` to `super()`. - Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()` so the SDK plays nicely with the new ServiceLocator-driven init path. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
Apr 30, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
Apr 30, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
barjin
approved these changes
May 7, 2026
- configuration.ts: use APIFY_ENV_VARS constants from @apify/consts in place of inline env var name string literals where a constant exists - charging.ts: prefer ?? over || for maxTotalChargeUsd and isAtHome (empty string -> undefined is already handled by crawlee v4 Option A, so the `|| 0` workaround for `0` is obsolete and the !! on the boolean-or-undefined isAtHome is clearer as `?? false`) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan
added a commit
that referenced
this pull request
May 11, 2026
Crawlee v4's `EventManager` constructor now requires `EventManagerOptions` (just `persistStateIntervalMillis`), and the base class no longer carries a `config` field — the previous `override readonly config` pattern is no longer valid. - Drop the `override` and store `config` as own readonly property. - Forward `persistStateIntervalMillis` to `super()`. - Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()` so the SDK plays nicely with the new ServiceLocator-driven init path. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 11, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 11, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan
added a commit
that referenced
this pull request
May 11, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
v4 `Configuration` resolves env vars eagerly at construction, so tests that mutate `process.env` afterwards need to drop the cached singleton. The previous pattern bypassed the public API and poked private static state via type assertions, duplicated across multiple test files. - `Configuration.reset()` clears the SDK's own `globalConfig` static *and* delegates to `serviceLocator.reset()` (matches the upcoming crawlee API in apify/crawlee#3649 — once published the SDK can swap the explicit `serviceLocator.reset()` call for `super.reset()`). - `Actor.reset()` clears `Actor._instance` and calls `Configuration.reset()`. Tests use this single call instead of the three-step boilerplate. - `utils.test.ts` and `actor.test.ts` updated; the awkward inline `(Configuration as unknown as { globalConfig?: ... })` / `(Actor as unknown as { _instance?: ... })` blocks are gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3c94cb0 to
b1f74e7
Compare
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
…rnal `Actor.reset()` was too generic for the main SDK entrypoint — readers would reasonably expect it to reset *an Actor instance*. Rename to `Actor.resetGlobalState()` (matching the SDK's prior convention for `Configuration.resetGlobalState()` and making the intent explicit: "drop the cached singletons so the next access reconstructs from the current env"). Mark `@internal` so it doesn't surface in public TypeDoc.
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
…uration crawlee 4.0.0-beta.56 ships `Configuration.reset()` (apify/crawlee#3649), so the SDK's override can delegate to `super.reset()` instead of calling `serviceLocator.reset()` directly. The SDK still owns clearing its own `globalConfig` static and replacing the `AsyncLocalStorage` singleton. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4's `EventManager` constructor now requires `EventManagerOptions` (just `persistStateIntervalMillis`), and the base class no longer carries a `config` field — the previous `override readonly config` pattern is no longer valid. - Drop the `override` and store `config` as own readonly property. - Forward `persistStateIntervalMillis` to `super()`. - Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()` so the SDK plays nicely with the new ServiceLocator-driven init path. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 12, 2026
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 12, 2026
`Actor.resetGlobalState()` and `Configuration.reset()` were both misplaced — Actor is the main public entry point and shouldn't carry test-cleanup methods, and `Configuration.reset()` is misleading because it doesn't reset anything *on* the Configuration; it just drops the singletons that the service locator + SDK statics keep around. Move the cleanup to `test/resetGlobalState.ts`, exported only inside the test tree, and update the two test files that used the static methods to import from there. Production-side SDK surface no longer exposes a generic reset. (Crawlee's `Configuration.reset()` will be reverted separately — apify/crawlee#3649. Until that lands, calling it is harmless; we just don't call it anymore.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan
added a commit
that referenced
this pull request
May 13, 2026
Crawlee v4's `EventManager` constructor now requires `EventManagerOptions` (just `persistStateIntervalMillis`), and the base class no longer carries a `config` field — the previous `override readonly config` pattern is no longer valid. - Drop the `override` and store `config` as own readonly property. - Forward `persistStateIntervalMillis` to `super()`. - Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()` so the SDK plays nicely with the new ServiceLocator-driven init path. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 13, 2026
B4nan
added a commit
that referenced
this pull request
May 13, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory methods that accept `id` *or* `name`), removed the cached `storageObject` from `KeyValueStore`, and made `getPublicUrl` async. The existing SDK code targeted the v3 shape and no longer compiles. Changes: - New `ApifyStorageClient` adapter wraps `apify-client`'s legacy `dataset()/keyValueStore()/requestQueue()` accessors and exposes the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient` factories crawlee now expects. Names are resolved to IDs via the collection `getOrCreate(name)` calls. apify-client's resource clients don't yet implement v4-only members like `getMetadata` / `getRecordPublicUrl`; the adapter casts through with a TODO comment so the structural alignment can land separately upstream. - `Actor.init` and `_openStorage` now wrap `this.apifyClient` in `ApifyStorageClient` before handing it to crawlee. - `KeyValueStore.getPublicUrl` is now async; the per-store `urlSigningSecretKey` is fetched on demand via the (private) `client.getMetadata()` instead of the removed `storageObject` cache. URL-signing behaviour for platform-mode reads is preserved. - `Actor.openRequestQueue` reads `totalRequestCount` via the new `client.getMetadata()` (the old `client.get()` was dropped). - `StorageManager.openStorage` is now `(class, id?, client?)` — removed the trailing `this.config` argument. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 13, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 13, 2026
…uration })`
The Actor constructor previously took either zero options (use cached
global Configuration) or field-level overrides (`{ token: ..., inputKey: ... }`
constructs a fresh Configuration from those). There was no way to hand
the Actor a Configuration instance you already have — useful for tests
that want a fresh env-resolved Configuration without touching the global
singleton, and for application code that wires its own config explicitly.
Adds an optional `configuration` field on the constructor options. When
present, it takes precedence over field-level overrides (which are
ignored) so the contract stays unambiguous. Mirrors crawlee's
BasicCrawler pattern.
`Actor.getInput` tests use it: dropping the `resetGlobalState()` +
`actor.config = new Configuration()` dance for a single
`new Actor({ configuration: new Configuration() })`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan
added a commit
that referenced
this pull request
May 14, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 14, 2026
B4nan
added a commit
that referenced
this pull request
May 14, 2026
Crawlee v4 reshaped `ProxyConfiguration`: - `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions` argument; the previous `(sessionId, options)` pair is gone. - The protected `_handleCustomUrl(sessionId)` helper was removed; the `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options only. - `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`. Changes: - `newProxyInfo` and `newUrl` accept `string | number | TieredProxyOptions | undefined` so existing SDK callers that pass a raw `sessionId` keep working, while the override remains compatible with crawlee's v4 signature. A small `parseSessionIdOrOptions` helper discriminates and pulls `sessionId` from `options.request` when no explicit one is given. - Inlined custom-URL session stickiness via a new private `getSessionIndex(sessionId)` (replacing the removed `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class. - Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface so users can still read `proxyInfo.sessionId` (v3 carried it on the base type). - Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported from `@crawlee/core`). - Tightened a `proxyUrls.some(url => url.includes(...))` access for the new `(string | null)[]` array shape. Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan
added a commit
that referenced
this pull request
May 14, 2026
…onfig()`
Two spots inside `Actor` were still reaching for the global Configuration
singleton instead of the Actor's own `this.config`, which silently
defeated the new `new Actor({ configuration })` option:
- `init()`: `serviceLocator.setConfiguration(Configuration.getGlobalConfig())`
registered the *global* config with the service locator, even when the
Actor was constructed with a custom one. Crawlee internals created
later (event manager, storage client, …) then saw the wrong instance.
- `useState()`: fell back to `Configuration.getGlobalConfig()` for the
underlying `KeyValueStore.open({ config })`, same problem.
Both now use `this.config`. The stale comment on the `init()` line
("reset global config instance to respect APIFY_ prefixed env vars" —
made sense in v3 with mutable Configuration, not in v4 where values
are resolved eagerly at construction) is updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Configurationclass extension to work with the new field-based Configuration from crawlee (SimplifyConfigurationcrawlee#3080).get('key')/.set('key', value)calls with direct property access (.key) across all SDK filesconfig.useStorageClient()/config.useEventManager()withserviceLocator.setStorageClient()/serviceLocator.setEventManager()zodas a direct dependency (SDK defines its own config fields using zod schemas)Dependencies
Configurationcrawlee#3080 being merged and released first🤖 Generated with Claude Code