Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 7 additions & 27 deletions docs/guides/proxy_management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -83,35 +83,17 @@ Your crawlers will now use the selected proxies for all connections.

### IP Rotation and session management

Every call to

<ApiLink to="apify/class/ProxyConfiguration#newUrl">
`proxyConfiguration.newUrl()`
</ApiLink>
allows you to pass a `sessionId` parameter. It will then be used to create a
`sessionId`-`proxyUrl` pair, and subsequent `newUrl()` calls with the same
`sessionId` will always return the same `proxyUrl`. This is extremely useful in
scraping, because you want to create the impression of a real user. See the
[session management guide](../guides/session-management) and
<CrawleeApiLink to="core/class/SessionPool">`SessionPool`</CrawleeApiLink> class
for more information on how keeping a real session helps you avoid blocking.

When no `sessionId` is provided, your proxy URLs are rotated round-robin, whereas Apify Proxy manages their rotation using black magic to get the best performance.

<!--DOCUSAURUS_CODE_TABS-->
returns an independent proxy URL. For Apify Proxy that URL embeds a fresh random
session id, so consecutive calls resolve to different IP addresses; for custom
`proxyUrls` the URLs are rotated round-robin.

<!-- Standalone -->

```javascript
const proxyConfiguration = await Actor.createProxyConfiguration({
/* opts */
});
const sessionPool = await SessionPool.open({
/* opts */
});
const session = await sessionPool.getSession();
const proxyUrl = proxyConfiguration.newUrl(session.id);
```

<!-- Crawlers -->
Session continuity (using the same IP across multiple requests, e.g. to keep a logged-in session alive) is handled one level up by Crawlee's <CrawleeApiLink to="core/class/SessionPool">`SessionPool`</CrawleeApiLink>: once a `Session` is paired with a proxy URL, the crawler reuses that pairing for subsequent requests tied to the same session. See the
[session management guide](../guides/session-management) for more details.

```javascript
const proxyConfiguration = await Actor.createProxyConfiguration({
Expand All @@ -125,8 +107,6 @@ const crawler = new PuppeteerCrawler({
});
```

<!--END_DOCUSAURUS_CODE_TABS-->

## Apify Proxy vs. Your own proxies

The `ProxyConfiguration` class covers both Apify Proxy and custom proxy URLs so that
Expand Down
88 changes: 88 additions & 0 deletions docs/upgrading/upgrading_v4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
id: upgrading-to-v4
title: Upgrading to v4
---

This page summarizes the breaking changes between Apify SDK v3 and v4. Apify SDK v4 adopts the redesigned Crawlee v4 interfaces (`Configuration`, `EventManager`, `StorageClient`, `ProxyConfiguration`), so most of the changes here track the corresponding Crawlee v4 changes.

## Configuration

The `Configuration` class no longer exposes `.get(key)` / `.set(key, value)`. Configuration values are resolved eagerly at construction time and exposed as plain typed properties.

Before (v3):

```ts
import { Configuration } from 'apify';

const config = Configuration.getGlobalConfig();
const token = config.get('token');
config.set('token', 'new-token');
```

After (v4):

```ts
import { Configuration } from 'apify';

// Construct with overrides — Configuration is immutable.
const config = new Configuration({ token: 'new-token' });
const token = config.token;
```

Resolution order (highest to lowest priority): constructor options → environment variables → `crawlee.json` → schema defaults.

Empty-string environment variables are treated as unset (they fall through to the schema default) rather than being coerced to `0` / `''` / `false`. For example, `ACTOR_MAX_TOTAL_CHARGE_USD=""` now resolves to `undefined` instead of `0`.

## ProxyConfiguration: `newUrl()` / `newProxyInfo()` no longer take `sessionId`

The `sessionId` parameter has been removed from both `ProxyConfiguration.newUrl()` and `ProxyConfiguration.newProxyInfo()`. Each call now returns an independent URL; for Apify Proxy the SDK mints a fresh random session id internally for every URL it hands out, so consecutive calls resolve to different IPs.

Before (v3):

```ts
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
});

// Sticky pairing: same sessionId → same proxy URL → same IP.
const url1 = await proxyConfiguration.newUrl('mySession');
const url2 = await proxyConfiguration.newUrl('mySession'); // === url1
```

After (v4):

```ts
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
});

// Every call returns an independent URL with its own session id.
const url1 = await proxyConfiguration.newUrl();
const url2 = await proxyConfiguration.newUrl(); // !== url1
```

Session continuity (reusing the same IP across multiple requests) is now handled one level up by Crawlee's `SessionPool`: a `Session` stores the proxy URL it was paired with and the crawler reuses that URL for subsequent requests bound to the same session. When using `CheerioCrawler`, `PlaywrightCrawler`, etc. with `useSessionPool: true`, this is automatic — no code changes are required on the consumer side.

`ProxyInfo` no longer carries a `sessionId` field. If you used it for logging or analytics, parse the `session-<id>` segment out of `proxyInfo.username` instead (it is included for Apify Proxy URLs).

The `tieredProxyUrls` and `tieredProxyConfig` options on `ProxyConfigurationOptions` were dropped in Crawlee v4 ([apify/crawlee#3599](https://github.com/apify/crawlee/pull/3599)) and the SDK no longer threads them through. Migrate to named sessions via `SessionPool` if you relied on tiered rotation.

## EventManager

`PlatformEventManager` now extends Crawlee v4's `EventManager` and integrates with the new service locator. Use `Configuration.getGlobalConfig()` (or pass a `Configuration` instance explicitly) when constructing it directly — the constructor no longer accepts a `config` override via the `override` keyword pattern because Crawlee's base class manages the configuration through `serviceLocator` instead of a `config` field.

If you only interact with events through `Actor.on()` / `Actor.off()` / `Actor.events`, no code changes are needed.

## StorageClient

The SDK's storage layer was adapted to the new Crawlee v4 `StorageClient` interface. The Apify platform client is wrapped via an internal `ApifyStorageClient` adapter that implements `createDatasetClient`, `createKeyValueStoreClient`, and `createRequestQueueClient`.

`KeyValueStore.getPublicUrl()` is now asynchronous (it signs URLs server-side when running on the Apify platform). Update call sites accordingly:

```ts
// v3
const url = store.getPublicUrl('myKey');

// v4
const url = await store.getPublicUrl('myKey');
```
Loading