Skip to content

refactor: adapt SDK Configuration for crawlee Configuration redesign#583

Open
B4nan wants to merge 27 commits into
v4from
refactor/configuration-class-redesign
Open

refactor: adapt SDK Configuration for crawlee Configuration redesign#583
B4nan wants to merge 27 commits into
v4from
refactor/configuration-class-redesign

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented Mar 25, 2026

Summary

  • Adapts the SDK's Configuration class extension to work with the new field-based Configuration from crawlee (Simplify Configuration crawlee#3080)
  • Replaces .get('key') / .set('key', value) calls with direct property access (.key) across all SDK files
  • Replaces config.useStorageClient() / config.useEventManager() with serviceLocator.setStorageClient() / serviceLocator.setEventManager()
  • Adds zod as a direct dependency (SDK defines its own config fields using zod schemas)
  • Fixes tests that relied on mutable configuration (env var mutation after init)

Dependencies

  • Depends on Simplify Configuration crawlee#3080 being merged and released first
  • There are also unrelated v4 API changes in crawlee core (EventManager constructor, ProxyConfiguration signatures, etc.) that will need separate fixes before this fully type-checks

🤖 Generated with Claude Code

B4nan and others added 2 commits March 13, 2026 18:36
Refactor the SDK Configuration class to match the new crawlee core
Configuration redesign:

- Subclass core Configuration using `protected static override fields`
- Direct property access (`config.token`) instead of `config.get('token')`
- Immutable: values set via constructor, no `set()` method
- Priority: constructor options > env vars > schema defaults
- isAtHome conditional defaults moved into field definitions
- Use serviceLocator instead of config.useStorageClient/getEventManager
- Import z, coerceNumber, coerceBoolean from @crawlee/core (no direct zod dep)
- Update all .get()/.set() call sites in actor.ts, charging.ts, etc.
- Update tests to use property access

Depends on crawlee PR: apify/crawlee#3474

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Import `z` from `zod` directly (no longer re-exported from crawlee core)
- Define `coerceNumber` locally (no longer exported from crawlee core)
- Add constructor override to accept `ApifyConfigurationInput`
- Import `ConfigurationOptions` from SDK configuration instead of core
- Fix test that mutated env vars after init (immutable config)

Depends on: apify/crawlee#3080

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
B4nan and others added 2 commits April 20, 2026 16:23
Restore the destructuring of `storageDir` and spread of remaining
`storageClientOptions` into the `ApifyClient` constructor so that
arbitrary client options configured via `storageClientOptions` continue
to reach the client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gration

- Reuse `coerceNumber` from `@crawlee/core` instead of defining a local
  copy; otherwise `FieldsOutput<typeof apifyConfigFields>` produces a
  structurally distinct (but equivalent) `availableMemoryRatio` type
  that breaks declaration-merging with crawlee's `Configuration`.
- Drop the dead `storageClientOptions`/`storageDir` destructuring in
  `Actor.newClient()` — neither key exists in the redesigned
  Configuration; `options` already covers the override path.

The remaining build errors (proxy/storage/event drift) are unrelated
to the config redesign and tracked in separate follow-up PRs against
the v4 branch.
B4nan added 10 commits April 30, 2026 18:18
…ent cases

Crawlee v4's Configuration resolves env vars eagerly at construction,
so the existing 'Actor.newClient() reads environment variables
correctly' test reads stale values once a prior test or import-time
side effect has already created the singleton. Reset both before
each case.
The SDK's `Configuration` keeps its own static singleton separate from
crawlee's serviceLocator. Resetting only the locator wasn't enough —
`Configuration.getGlobalConfig()` still handed back the stale cached
SDK config (which was built before the test set `APIFY_TOKEN`).
- Reword "empty string maxTotalChargeUsd" assertion: under Option A
  the empty env var is now treated as unset, so `config.maxTotalChargeUsd`
  is `undefined` (charging manager still defaults to Infinity).
- Actor.getInput tests now build a fresh Actor *after* setting the
  env vars they exercise — eager config resolution means a single
  module-scoped TestingActor would carry stale values.
Crawlee's Configuration uses crawleeConfigFields and only knows about
`CRAWLEE_INPUT_KEY`. The SDK extension adds `ACTOR_INPUT_KEY` /
`APIFY_INPUT_KEY` env-var aliases, which the test relies on.
Importing Configuration from 'apify' makes `new Configuration()`
inside buildActor() resolve those env vars correctly.
`@crawlee/linkedom@4.0.0-beta.49`'s `linkedom-crawler.js` imports
`cheerio` without declaring it as a dependency. Locally this works
when a parent directory has cheerio installed; CI's fresh install
fails. Adding it directly here keeps tests green until the upstream
package fixes the missing dep declaration.
`@crawlee/linkedom@4.0.0-beta.51` now declares cheerio as a direct
dependency (apify/crawlee#3620), so the SDK no longer has to ship its
own cheerio devDep to mask the missing declaration.
B4nan added a commit that referenced this pull request Apr 30, 2026
Crawlee v4's `EventManager` constructor now requires
`EventManagerOptions` (just `persistStateIntervalMillis`), and the
base class no longer carries a `config` field — the previous
`override readonly config` pattern is no longer valid.

- Drop the `override` and store `config` as own readonly property.
- Forward `persistStateIntervalMillis` to `super()`.
- Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()`
  so the SDK plays nicely with the new ServiceLocator-driven init path.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request Apr 30, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory
methods that accept `id` *or* `name`), removed the cached
`storageObject` from `KeyValueStore`, and made `getPublicUrl` async.
The existing SDK code targeted the v3 shape and no longer compiles.

Changes:
- New `ApifyStorageClient` adapter wraps `apify-client`'s legacy
  `dataset()/keyValueStore()/requestQueue()` accessors and exposes
  the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient`
  factories crawlee now expects. Names are resolved to IDs via the
  collection `getOrCreate(name)` calls. apify-client's resource
  clients don't yet implement v4-only members like `getMetadata` /
  `getRecordPublicUrl`; the adapter casts through with a TODO
  comment so the structural alignment can land separately upstream.
- `Actor.init` and `_openStorage` now wrap `this.apifyClient` in
  `ApifyStorageClient` before handing it to crawlee.
- `KeyValueStore.getPublicUrl` is now async; the per-store
  `urlSigningSecretKey` is fetched on demand via the (private)
  `client.getMetadata()` instead of the removed `storageObject`
  cache. URL-signing behaviour for platform-mode reads is preserved.
- `Actor.openRequestQueue` reads `totalRequestCount` via the new
  `client.getMetadata()` (the old `client.get()` was dropped).
- `StorageManager.openStorage` is now `(class, id?, client?)` —
  removed the trailing `this.config` argument.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request Apr 30, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
@barjin barjin self-requested a review May 7, 2026 06:43
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @B4nan , I just have a few questions / refactoring nits, nothing major 👍

Comment thread packages/apify/src/proxy_configuration.ts Outdated
Comment thread packages/apify/src/platform_event_manager.ts Outdated
Comment thread packages/apify/src/configuration.ts Outdated
Comment thread packages/apify/src/charging.ts Outdated
- configuration.ts: use APIFY_ENV_VARS constants from @apify/consts in
  place of inline env var name string literals where a constant exists
- charging.ts: prefer ?? over || for maxTotalChargeUsd and isAtHome
  (empty string -> undefined is already handled by crawlee v4 Option A,
  so the `|| 0` workaround for `0` is obsolete and the !! on the
  boolean-or-undefined isAtHome is clearer as `?? false`)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan added a commit that referenced this pull request May 11, 2026
Crawlee v4's `EventManager` constructor now requires
`EventManagerOptions` (just `persistStateIntervalMillis`), and the
base class no longer carries a `config` field — the previous
`override readonly config` pattern is no longer valid.

- Drop the `override` and store `config` as own readonly property.
- Forward `persistStateIntervalMillis` to `super()`.
- Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()`
  so the SDK plays nicely with the new ServiceLocator-driven init path.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 11, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory
methods that accept `id` *or* `name`), removed the cached
`storageObject` from `KeyValueStore`, and made `getPublicUrl` async.
The existing SDK code targeted the v3 shape and no longer compiles.

Changes:
- New `ApifyStorageClient` adapter wraps `apify-client`'s legacy
  `dataset()/keyValueStore()/requestQueue()` accessors and exposes
  the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient`
  factories crawlee now expects. Names are resolved to IDs via the
  collection `getOrCreate(name)` calls. apify-client's resource
  clients don't yet implement v4-only members like `getMetadata` /
  `getRecordPublicUrl`; the adapter casts through with a TODO
  comment so the structural alignment can land separately upstream.
- `Actor.init` and `_openStorage` now wrap `this.apifyClient` in
  `ApifyStorageClient` before handing it to crawlee.
- `KeyValueStore.getPublicUrl` is now async; the per-store
  `urlSigningSecretKey` is fetched on demand via the (private)
  `client.getMetadata()` instead of the removed `storageObject`
  cache. URL-signing behaviour for platform-mode reads is preserved.
- `Actor.openRequestQueue` reads `totalRequestCount` via the new
  `client.getMetadata()` (the old `client.get()` was dropped).
- `StorageManager.openStorage` is now `(class, id?, client?)` —
  removed the trailing `this.config` argument.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 11, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan added a commit that referenced this pull request May 11, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
v4 `Configuration` resolves env vars eagerly at construction, so tests
that mutate `process.env` afterwards need to drop the cached singleton.
The previous pattern bypassed the public API and poked private static
state via type assertions, duplicated across multiple test files.

- `Configuration.reset()` clears the SDK's own `globalConfig` static
  *and* delegates to `serviceLocator.reset()` (matches the upcoming
  crawlee API in apify/crawlee#3649 — once published the SDK can
  swap the explicit `serviceLocator.reset()` call for `super.reset()`).
- `Actor.reset()` clears `Actor._instance` and calls
  `Configuration.reset()`. Tests use this single call instead of the
  three-step boilerplate.
- `utils.test.ts` and `actor.test.ts` updated; the awkward inline
  `(Configuration as unknown as { globalConfig?: ... })` /
  `(Actor as unknown as { _instance?: ... })` blocks are gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@B4nan B4nan force-pushed the refactor/configuration-class-redesign branch from 3c94cb0 to b1f74e7 Compare May 12, 2026 14:26
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
…rnal

`Actor.reset()` was too generic for the main SDK entrypoint — readers
would reasonably expect it to reset *an Actor instance*. Rename to
`Actor.resetGlobalState()` (matching the SDK's prior convention for
`Configuration.resetGlobalState()` and making the intent explicit:
"drop the cached singletons so the next access reconstructs from the
current env"). Mark `@internal` so it doesn't surface in public TypeDoc.
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
…uration

crawlee 4.0.0-beta.56 ships `Configuration.reset()` (apify/crawlee#3649),
so the SDK's override can delegate to `super.reset()` instead of calling
`serviceLocator.reset()` directly. The SDK still owns clearing its own
`globalConfig` static and replacing the `AsyncLocalStorage` singleton.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4's `EventManager` constructor now requires
`EventManagerOptions` (just `persistStateIntervalMillis`), and the
base class no longer carries a `config` field — the previous
`override readonly config` pattern is no longer valid.

- Drop the `override` and store `config` as own readonly property.
- Forward `persistStateIntervalMillis` to `super()`.
- Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()`
  so the SDK plays nicely with the new ServiceLocator-driven init path.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory
methods that accept `id` *or* `name`), removed the cached
`storageObject` from `KeyValueStore`, and made `getPublicUrl` async.
The existing SDK code targeted the v3 shape and no longer compiles.

Changes:
- New `ApifyStorageClient` adapter wraps `apify-client`'s legacy
  `dataset()/keyValueStore()/requestQueue()` accessors and exposes
  the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient`
  factories crawlee now expects. Names are resolved to IDs via the
  collection `getOrCreate(name)` calls. apify-client's resource
  clients don't yet implement v4-only members like `getMetadata` /
  `getRecordPublicUrl`; the adapter casts through with a TODO
  comment so the structural alignment can land separately upstream.
- `Actor.init` and `_openStorage` now wrap `this.apifyClient` in
  `ApifyStorageClient` before handing it to crawlee.
- `KeyValueStore.getPublicUrl` is now async; the per-store
  `urlSigningSecretKey` is fetched on demand via the (private)
  `client.getMetadata()` instead of the removed `storageObject`
  cache. URL-signing behaviour for platform-mode reads is preserved.
- `Actor.openRequestQueue` reads `totalRequestCount` via the new
  `client.getMetadata()` (the old `client.get()` was dropped).
- `StorageManager.openStorage` is now `(class, id?, client?)` —
  removed the trailing `this.config` argument.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
B4nan added a commit that referenced this pull request May 12, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 12, 2026
… redesign

Squashes the full content of #583 into a single commit
so the bundle PR shows a clean four-commit summary of the v4 catch-up
stack. See PR #583 for the per-commit history.
`Actor.resetGlobalState()` and `Configuration.reset()` were both
misplaced — Actor is the main public entry point and shouldn't carry
test-cleanup methods, and `Configuration.reset()` is misleading because
it doesn't reset anything *on* the Configuration; it just drops the
singletons that the service locator + SDK statics keep around.

Move the cleanup to `test/resetGlobalState.ts`, exported only inside
the test tree, and update the two test files that used the static
methods to import from there. Production-side SDK surface no longer
exposes a generic reset.

(Crawlee's `Configuration.reset()` will be reverted separately —
apify/crawlee#3649. Until that lands, calling it is harmless; we just
don't call it anymore.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan added a commit that referenced this pull request May 13, 2026
Crawlee v4's `EventManager` constructor now requires
`EventManagerOptions` (just `persistStateIntervalMillis`), and the
base class no longer carries a `config` field — the previous
`override readonly config` pattern is no longer valid.

- Drop the `override` and store `config` as own readonly property.
- Forward `persistStateIntervalMillis` to `super()`.
- Add a `fromConfig()` factory mirroring `LocalEventManager.fromConfig()`
  so the SDK plays nicely with the new ServiceLocator-driven init path.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 13, 2026
B4nan added a commit that referenced this pull request May 13, 2026
Crawlee v4 reshaped its `StorageClient` interface (async factory
methods that accept `id` *or* `name`), removed the cached
`storageObject` from `KeyValueStore`, and made `getPublicUrl` async.
The existing SDK code targeted the v3 shape and no longer compiles.

Changes:
- New `ApifyStorageClient` adapter wraps `apify-client`'s legacy
  `dataset()/keyValueStore()/requestQueue()` accessors and exposes
  the `createDatasetClient/createKeyValueStoreClient/createRequestQueueClient`
  factories crawlee now expects. Names are resolved to IDs via the
  collection `getOrCreate(name)` calls. apify-client's resource
  clients don't yet implement v4-only members like `getMetadata` /
  `getRecordPublicUrl`; the adapter casts through with a TODO
  comment so the structural alignment can land separately upstream.
- `Actor.init` and `_openStorage` now wrap `this.apifyClient` in
  `ApifyStorageClient` before handing it to crawlee.
- `KeyValueStore.getPublicUrl` is now async; the per-store
  `urlSigningSecretKey` is fetched on demand via the (private)
  `client.getMetadata()` instead of the removed `storageObject`
  cache. URL-signing behaviour for platform-mode reads is preserved.
- `Actor.openRequestQueue` reads `totalRequestCount` via the new
  `client.getMetadata()` (the old `client.get()` was dropped).
- `StorageManager.openStorage` is now `(class, id?, client?)` —
  removed the trailing `this.config` argument.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 13, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 13, 2026
… redesign

Squashes the full content of #583 into a single commit
so the bundle PR shows a clean four-commit summary of the v4 catch-up
stack. See PR #583 for the per-commit history.
…uration })`

The Actor constructor previously took either zero options (use cached
global Configuration) or field-level overrides (`{ token: ..., inputKey: ... }`
constructs a fresh Configuration from those). There was no way to hand
the Actor a Configuration instance you already have — useful for tests
that want a fresh env-resolved Configuration without touching the global
singleton, and for application code that wires its own config explicitly.

Adds an optional `configuration` field on the constructor options. When
present, it takes precedence over field-level overrides (which are
ignored) so the contract stays unambiguous. Mirrors crawlee's
BasicCrawler pattern.

`Actor.getInput` tests use it: dropping the `resetGlobalState()` +
`actor.config = new Configuration()` dance for a single
`new Actor({ configuration: new Configuration() })`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
B4nan added a commit that referenced this pull request May 14, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 14, 2026
… redesign

Squashes the full content of #583 into a single commit
so the bundle PR shows a clean four-commit summary of the v4 catch-up
stack. See PR #583 for the per-commit history.
B4nan added a commit that referenced this pull request May 14, 2026
Crawlee v4 reshaped `ProxyConfiguration`:
- `newProxyInfo` and `newUrl` now take a single `TieredProxyOptions`
  argument; the previous `(sessionId, options)` pair is gone.
- The protected `_handleCustomUrl(sessionId)` helper was removed; the
  `_callNewUrlFunction` and `_handleTieredUrl` helpers now take options
  only.
- `ProxyInfo` (in `@crawlee/types`) no longer carries `sessionId`.

Changes:
- `newProxyInfo` and `newUrl` accept `string | number |
  TieredProxyOptions | undefined` so existing SDK callers that pass a
  raw `sessionId` keep working, while the override remains compatible
  with crawlee's v4 signature. A small `parseSessionIdOrOptions`
  helper discriminates and pulls `sessionId` from `options.request`
  when no explicit one is given.
- Inlined custom-URL session stickiness via a new private
  `getSessionIndex(sessionId)` (replacing the removed
  `_handleCustomUrl`), keyed on `usedProxyUrls` like the base class.
- Re-declared `sessionId?: string` on the SDK's `ProxyInfo` interface
  so users can still read `proxyInfo.sessionId` (v3 carried it on the
  base type).
- Re-imported `ProxyInfo` from `@crawlee/types` (no longer re-exported
  from `@crawlee/core`).
- Tightened a `proxyUrls.some(url => url.includes(...))` access for
  the new `(string | null)[]` array shape.

Stacked on #583 (config redesign); rebases onto v4 once that lands.
B4nan added a commit that referenced this pull request May 14, 2026
… redesign

Squashes the full content of #583 into a single commit
so the bundle PR shows a clean four-commit summary of the v4 catch-up
stack. See PR #583 for the per-commit history.
…onfig()`

Two spots inside `Actor` were still reaching for the global Configuration
singleton instead of the Actor's own `this.config`, which silently
defeated the new `new Actor({ configuration })` option:

- `init()`: `serviceLocator.setConfiguration(Configuration.getGlobalConfig())`
  registered the *global* config with the service locator, even when the
  Actor was constructed with a custom one. Crawlee internals created
  later (event manager, storage client, …) then saw the wrong instance.
- `useState()`: fell back to `Configuration.getGlobalConfig()` for the
  underlying `KeyValueStore.open({ config })`, same problem.

Both now use `this.config`. The stale comment on the `init()` line
("reset global config instance to respect APIFY_ prefixed env vars" —
made sense in v3 with mutable Configuration, not in v4 where values
are resolved eagerly at construction) is updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants