diff --git a/docs/guides/proxy_management.mdx b/docs/guides/proxy_management.mdx index a271a063fb..89e25469f0 100644 --- a/docs/guides/proxy_management.mdx +++ b/docs/guides/proxy_management.mdx @@ -83,35 +83,17 @@ Your crawlers will now use the selected proxies for all connections. ### IP Rotation and session management +Every call to + `proxyConfiguration.newUrl()` -allows you to pass a `sessionId` parameter. It will then be used to create a -`sessionId`-`proxyUrl` pair, and subsequent `newUrl()` calls with the same -`sessionId` will always return the same `proxyUrl`. This is extremely useful in -scraping, because you want to create the impression of a real user. See the -[session management guide](../guides/session-management) and -`SessionPool` class -for more information on how keeping a real session helps you avoid blocking. - -When no `sessionId` is provided, your proxy URLs are rotated round-robin, whereas Apify Proxy manages their rotation using black magic to get the best performance. - - +returns an independent proxy URL. For Apify Proxy that URL embeds a fresh random +session id, so consecutive calls resolve to different IP addresses; for custom +`proxyUrls` the URLs are rotated round-robin. - - -```javascript -const proxyConfiguration = await Actor.createProxyConfiguration({ - /* opts */ -}); -const sessionPool = await SessionPool.open({ - /* opts */ -}); -const session = await sessionPool.getSession(); -const proxyUrl = proxyConfiguration.newUrl(session.id); -``` - - +Session continuity (using the same IP across multiple requests, e.g. to keep a logged-in session alive) is handled one level up by Crawlee's `SessionPool`: once a `Session` is paired with a proxy URL, the crawler reuses that pairing for subsequent requests tied to the same session. See the +[session management guide](../guides/session-management) for more details. ```javascript const proxyConfiguration = await Actor.createProxyConfiguration({ @@ -125,8 +107,6 @@ const crawler = new PuppeteerCrawler({ }); ``` - - ## Apify Proxy vs. Your own proxies The `ProxyConfiguration` class covers both Apify Proxy and custom proxy URLs so that diff --git a/docs/upgrading/upgrading_v4.md b/docs/upgrading/upgrading_v4.md new file mode 100644 index 0000000000..ff4269d1c1 --- /dev/null +++ b/docs/upgrading/upgrading_v4.md @@ -0,0 +1,88 @@ +--- +id: upgrading-to-v4 +title: Upgrading to v4 +--- + +This page summarizes the breaking changes between Apify SDK v3 and v4. Apify SDK v4 adopts the redesigned Crawlee v4 interfaces (`Configuration`, `EventManager`, `StorageClient`, `ProxyConfiguration`), so most of the changes here track the corresponding Crawlee v4 changes. + +## Configuration + +The `Configuration` class no longer exposes `.get(key)` / `.set(key, value)`. Configuration values are resolved eagerly at construction time and exposed as plain typed properties. + +Before (v3): + +```ts +import { Configuration } from 'apify'; + +const config = Configuration.getGlobalConfig(); +const token = config.get('token'); +config.set('token', 'new-token'); +``` + +After (v4): + +```ts +import { Configuration } from 'apify'; + +// Construct with overrides — Configuration is immutable. +const config = new Configuration({ token: 'new-token' }); +const token = config.token; +``` + +Resolution order (highest to lowest priority): constructor options → environment variables → `crawlee.json` → schema defaults. + +Empty-string environment variables are treated as unset (they fall through to the schema default) rather than being coerced to `0` / `''` / `false`. For example, `ACTOR_MAX_TOTAL_CHARGE_USD=""` now resolves to `undefined` instead of `0`. + +## ProxyConfiguration: `newUrl()` / `newProxyInfo()` no longer take `sessionId` + +The `sessionId` parameter has been removed from both `ProxyConfiguration.newUrl()` and `ProxyConfiguration.newProxyInfo()`. Each call now returns an independent URL; for Apify Proxy the SDK mints a fresh random session id internally for every URL it hands out, so consecutive calls resolve to different IPs. + +Before (v3): + +```ts +const proxyConfiguration = await Actor.createProxyConfiguration({ + groups: ['RESIDENTIAL'], +}); + +// Sticky pairing: same sessionId → same proxy URL → same IP. +const url1 = await proxyConfiguration.newUrl('mySession'); +const url2 = await proxyConfiguration.newUrl('mySession'); // === url1 +``` + +After (v4): + +```ts +const proxyConfiguration = await Actor.createProxyConfiguration({ + groups: ['RESIDENTIAL'], +}); + +// Every call returns an independent URL with its own session id. +const url1 = await proxyConfiguration.newUrl(); +const url2 = await proxyConfiguration.newUrl(); // !== url1 +``` + +Session continuity (reusing the same IP across multiple requests) is now handled one level up by Crawlee's `SessionPool`: a `Session` stores the proxy URL it was paired with and the crawler reuses that URL for subsequent requests bound to the same session. When using `CheerioCrawler`, `PlaywrightCrawler`, etc. with `useSessionPool: true`, this is automatic — no code changes are required on the consumer side. + +`ProxyInfo` no longer carries a `sessionId` field. If you used it for logging or analytics, parse the `session-` segment out of `proxyInfo.username` instead (it is included for Apify Proxy URLs). + +The `tieredProxyUrls` and `tieredProxyConfig` options on `ProxyConfigurationOptions` were dropped in Crawlee v4 ([apify/crawlee#3599](https://github.com/apify/crawlee/pull/3599)) and the SDK no longer threads them through. Migrate to named sessions via `SessionPool` if you relied on tiered rotation. + +## EventManager + +`PlatformEventManager` now extends Crawlee v4's `EventManager` and integrates with the new service locator. Use `Configuration.getGlobalConfig()` (or pass a `Configuration` instance explicitly) when constructing it directly — the constructor no longer accepts a `config` override via the `override` keyword pattern because Crawlee's base class manages the configuration through `serviceLocator` instead of a `config` field. + +If you only interact with events through `Actor.on()` / `Actor.off()` / `Actor.events`, no code changes are needed. + +## StorageClient + +The SDK's storage layer was adapted to the new Crawlee v4 `StorageClient` interface. The Apify platform client is wrapped via an internal `ApifyStorageClient` adapter that implements `createDatasetClient`, `createKeyValueStoreClient`, and `createRequestQueueClient`. + +`KeyValueStore.getPublicUrl()` is now asynchronous (it signs URLs server-side when running on the Apify platform). Update call sites accordingly: + +```ts +// v3 +const url = store.getPublicUrl('myKey'); + +// v4 +const url = await store.getPublicUrl('myKey'); +``` diff --git a/packages/apify/src/proxy_configuration.ts b/packages/apify/src/proxy_configuration.ts index 231cfa6db0..e3033d07f8 100644 --- a/packages/apify/src/proxy_configuration.ts +++ b/packages/apify/src/proxy_configuration.ts @@ -1,8 +1,6 @@ -import type { - ProxyConfigurationOptions as CoreProxyConfigurationOptions, - ProxyInfo as CoreProxyInfo, -} from '@crawlee/core'; +import type { ProxyConfigurationOptions as CoreProxyConfigurationOptions } from '@crawlee/core'; import { ProxyConfiguration as CoreProxyConfiguration } from '@crawlee/core'; +import type { ProxyInfo as CoreProxyInfo } from '@crawlee/types'; import { gotScraping } from 'got-scraping'; import ow from 'ow'; @@ -12,12 +10,17 @@ import { cryptoRandomObjectId } from '@apify/utilities'; import { Actor } from './actor.js'; import { Configuration } from './configuration.js'; -// https://docs.apify.com/proxy/datacenter-proxy#username-parameters -const MAX_SESSION_ID_LENGTH = 50; const CHECK_ACCESS_REQUEST_TIMEOUT_MILLIS = 4_000; const CHECK_ACCESS_MAX_ATTEMPTS = 2; const COUNTRY_CODE_REGEX = /^[A-Z]{2}$/; +// Apify Proxy session identifier embedded in the proxy username — opaque to +// users; a fresh one is minted for every URL the SDK hands out so that the +// returned proxy URLs are independent. +const SESSION_ID_LENGTH = 12; + +type NewUrlOptions = Parameters[0]; + export interface ProxyConfigurationOptions extends CoreProxyConfigurationOptions { /** @@ -56,15 +59,6 @@ export interface ProxyConfigurationOptions * configurate the proxy by UI input schema. You should use the `countryCode` option in your crawler code. */ apifyProxyCountry?: string; - - /** - * Multiple different ProxyConfigurationOptions stratified into tiers. Crawlee crawlers will switch between those tiers - * based on the blocked request statistics. - */ - tieredProxyConfig?: Omit< - ProxyConfigurationOptions, - keyof CoreProxyConfigurationOptions | 'tieredProxyConfig' - >[]; } /** @@ -91,9 +85,6 @@ export interface ProxyConfigurationOptions * requestHandler({ proxyInfo }) { * // Getting used proxy URL * const proxyUrl = proxyInfo.url; - * - * // Getting ID of used Session - * const sessionIdentifier = proxyInfo.sessionId; * } * }) * @@ -104,7 +95,7 @@ export interface ProxyInfo extends CoreProxyInfo { * An array of proxy groups to be used by the [Apify Proxy](https://docs.apify.com/proxy). * If not provided, the proxy will select the groups automatically. */ - groups: string[]; + groups?: string[]; /** * If set and relevant proxies are available in your Apify account, all proxied requests will @@ -193,10 +184,6 @@ export class ProxyConfiguration extends CoreProxyConfiguration { apifyProxyCountry: ow.optional.string.matches(COUNTRY_CODE_REGEX), password: ow.optional.string, - tieredProxyUrls: ow.optional.array.ofType( - ow.array.ofType(ow.string), - ), - tieredProxyConfig: ow.optional.array.ofType(ow.object), }), ); @@ -206,19 +193,8 @@ export class ProxyConfiguration extends CoreProxyConfiguration { countryCode, apifyProxyCountry, password = config.proxyPassword, - tieredProxyConfig, - tieredProxyUrls, } = options; - this.tieredProxyUrls ??= tieredProxyUrls; - - if (tieredProxyConfig) { - this.tieredProxyUrls = this._generateTieredProxyUrls( - tieredProxyConfig, - options, - ); - } - const groupsToUse = groups.length ? groups : apifyProxyGroups; const countryCodeToUse = countryCode || apifyProxyCountry; const hostname = config.proxyHostname; @@ -241,7 +217,7 @@ export class ProxyConfiguration extends CoreProxyConfiguration { this.port = port; this.usesApifyProxy = !this.proxyUrls && !this.newUrlFunction; - if (proxyUrls && proxyUrls.some((url) => url.includes('apify.com'))) { + if (proxyUrls && proxyUrls.some((url) => url?.includes('apify.com'))) { this.log.warning( 'Some Apify proxy features may work incorrectly. Please consider setting up Apify properties instead of `proxyUrls`.\n' + 'See https://sdk.apify.com/docs/guides/proxy-management#apify-proxy-configuration', @@ -287,143 +263,65 @@ export class ProxyConfiguration extends CoreProxyConfiguration { } /** - * This function creates a new {@apilink ProxyInfo} info object. - * It is used by CheerioCrawler and PuppeteerCrawler to generate proxy URLs and also to allow the user to inspect - * the currently used proxy via the requestHandler parameter `proxyInfo`. - * Use it if you want to work with a rich representation of a proxy URL. - * If you need the URL string only, use {@apilink ProxyConfiguration.newUrl}. - * @param [sessionId] - * Represents the identifier of user {@apilink Session} that can be managed by the {@apilink SessionPool} or - * you can use the Apify Proxy [Session](https://docs.apify.com/proxy#sessions) identifier. - * When the provided sessionId is a number, it's converted to a string. Property sessionId of - * {@apilink ProxyInfo} is always returned as a type string. - * - * All the HTTP requests going through the proxy with the same session identifier - * will use the same target proxy server (i.e. the same IP address). - * The identifier must not be longer than 50 characters and include only the following: `0-9`, `a-z`, `A-Z`, `"."`, `"_"` and `"~"`. - * @return Represents information about used proxy and its configuration. + * Returns a new {@apilink ProxyInfo} object with a fresh proxy URL. Each call mints an + * independent URL; for Apify Proxy a random session id is embedded so consecutive + * calls resolve to different IPs. */ override async newProxyInfo( - sessionId?: string | number, - options?: Parameters[1], + options?: NewUrlOptions, ): Promise { - if (typeof sessionId === 'number') sessionId = `${sessionId}`; - ow( - sessionId, - ow.optional.string - .maxLength(MAX_SESSION_ID_LENGTH) - .matches(APIFY_PROXY_VALUE_REGEX), - ); - - const proxyInfo = await super.newProxyInfo(sessionId, options); - if (!proxyInfo) return proxyInfo; - - const { groups, countryCode, password, port, hostname } = ( - this.usesApifyProxy ? this : new URL(proxyInfo.url) - ) as ProxyConfiguration; - - return { - ...proxyInfo, - sessionId, - groups, - countryCode, - // this.password is not encoded, but the password from the URL will be, we need to normalize - password: this.usesApifyProxy - ? (password ?? '') - : decodeURIComponent(password!), - hostname, - port: port!, + const url = await this.newUrl(options); + if (!url) return undefined; + + const parsed = new URL(url); + const result: ProxyInfo = { + url, + username: decodeURIComponent(parsed.username), + password: decodeURIComponent(parsed.password), + hostname: parsed.hostname, + port: parsed.port, }; + if (this.usesApifyProxy) { + result.groups = this.groups; + if (this.countryCode !== undefined) + result.countryCode = this.countryCode; + } + return result; } /** - * Returns a new proxy URL based on provided configuration options and the `sessionId` parameter. - * @param [sessionId] - * Represents the identifier of user {@apilink Session} that can be managed by the {@apilink SessionPool} or - * you can use the Apify Proxy [Session](https://docs.apify.com/proxy#sessions) identifier. - * When the provided sessionId is a number, it's converted to a string. - * - * All the HTTP requests going through the proxy with the same session identifier - * will use the same target proxy server (i.e. the same IP address). - * The identifier must not be longer than 50 characters and include only the following: `0-9`, `a-z`, `A-Z`, `"."`, `"_"` and `"~"`. - * @return A string with a proxy URL, including authentication credentials and port number. - * For example, `http://bob:password123@proxy.example.com:8000` + * Returns a new proxy URL. For Apify Proxy, each call generates a URL with a fresh + * random session id, so consecutive calls return independent URLs. For custom + * `proxyUrls`, the URLs are rotated round-robin. */ override async newUrl( - sessionId?: string | number, - options?: Parameters[1], + options?: NewUrlOptions, ): Promise { - if (typeof sessionId === 'number') sessionId = `${sessionId}`; - ow( - sessionId, - ow.optional.string - .maxLength(MAX_SESSION_ID_LENGTH) - .matches(APIFY_PROXY_VALUE_REGEX), - ); - if (this.newUrlFunction) { - return ( - (await this._callNewUrlFunction(sessionId, { - request: options?.request, - })) ?? undefined - ); - } - if (this.proxyUrls) { - return this._handleCustomUrl(sessionId); - } - - if (this.tieredProxyUrls) { - return ( - this._handleTieredUrl( - sessionId ?? cryptoRandomObjectId(6), - options, - ).proxyUrl ?? undefined - ); + if (this.newUrlFunction || this.proxyUrls) { + return super.newUrl(options); } - - return this.composeDefaultUrl(sessionId); - } - - protected _generateTieredProxyUrls( - tieredProxyConfig: NonNullable< - ProxyConfigurationOptions['tieredProxyConfig'] - >, - globalOptions: ProxyConfigurationOptions, - ) { - return tieredProxyConfig.map((config) => [ - new ProxyConfiguration({ - ...globalOptions, - ...config, - tieredProxyConfig: undefined, - }).composeDefaultUrl(), - ]); + return this.composeDefaultUrl(cryptoRandomObjectId(SESSION_ID_LENGTH)); } /** * Returns proxy username. */ - protected _getUsername(sessionId?: string): string { - let username; + protected _getUsername(sessionId: string): string { const { groups, countryCode } = this; const parts: string[] = []; if (groups && groups.length) { parts.push(`groups-${groups.join('+')}`); } - if (sessionId) { - parts.push(`session-${sessionId}`); - } + parts.push(`session-${sessionId}`); if (countryCode) { parts.push(`country-${countryCode}`); } - username = parts.join(','); - - if (parts.length === 0) username = 'auto'; - - return username; + return parts.join(','); } - protected composeDefaultUrl(sessionId?: string): string { + protected composeDefaultUrl(sessionId: string): string { const username = this._getUsername(sessionId); const url = new URL(`http://${this.hostname}:${this.port}`); url.username = `${username}`; diff --git a/test/apify/proxy_configuration.test.ts b/test/apify/proxy_configuration.test.ts index 8c61a63177..718d9db1bb 100644 --- a/test/apify/proxy_configuration.test.ts +++ b/test/apify/proxy_configuration.test.ts @@ -1,25 +1,26 @@ import { Actor, ProxyConfiguration } from 'apify'; import { UserClient } from 'apify-client'; -import { type Dictionary, Request, sleep } from 'crawlee'; +import { type Dictionary } from 'crawlee'; import { gotScraping } from 'got-scraping'; import { APIFY_ENV_VARS, LOCAL_APIFY_ENV_VARS } from '@apify/consts'; +import { resetGlobalState } from '../resetGlobalState.js'; + const groups = ['GROUP1', 'GROUP2']; const hostname = LOCAL_APIFY_ENV_VARS[APIFY_ENV_VARS.PROXY_HOSTNAME]; const port = Number(LOCAL_APIFY_ENV_VARS[APIFY_ENV_VARS.PROXY_PORT]); const password = 'test12345'; const countryCode = 'CZ'; -const sessionId = 538909250932; const basicOpts = { groups, countryCode, password, }; -const basicOptsProxyUrl = - 'http://groups-GROUP1+GROUP2,session-538909250932,country-CZ:test12345@proxy.apify.com:8000'; -const proxyUrlNoSession = - 'http://groups-GROUP1+GROUP2,country-CZ:test12345@proxy.apify.com:8000'; +// Apify Proxy URLs always carry a fresh random `session-XXXX` segment; tests +// match against this pattern rather than a hard-coded session id. +const apifyProxyUrlPattern = + /^http:\/\/groups-GROUP1\+GROUP2,session-[A-Za-z0-9]+,country-CZ:test12345@proxy\.apify\.com:8000$/; vitest.mock('got-scraping', async () => { return { @@ -54,48 +55,45 @@ describe('ProxyConfiguration', () => { expect(proxyConfiguration.port).toBe(port); }); - test('newUrl() should return proxy URL', async () => { + test('newUrl() returns an Apify Proxy URL with a random session id', async () => { const proxyConfiguration = new ProxyConfiguration(basicOpts); - expect(await proxyConfiguration.newUrl(sessionId)).toBe( - basicOptsProxyUrl, - ); + const url1 = await proxyConfiguration.newUrl(); + const url2 = await proxyConfiguration.newUrl(); + + expect(url1).toMatch(apifyProxyUrlPattern); + expect(url2).toMatch(apifyProxyUrlPattern); + // Consecutive calls must produce independent URLs. + expect(url1).not.toBe(url2); }); - test('newProxyInfo() should return ProxyInfo object', async () => { + test('newProxyInfo() returns a ProxyInfo object with a fresh URL', async () => { const proxyConfiguration = new ProxyConfiguration(basicOpts); - const url = basicOptsProxyUrl; - const proxyInfo = { - sessionId: `${sessionId}`, - url, - groups, - countryCode, - password, - hostname, - port, - username: 'groups-GROUP1+GROUP2,session-538909250932,country-CZ', - }; - expect(await proxyConfiguration.newProxyInfo(sessionId)).toEqual( - proxyInfo, + const info = await proxyConfiguration.newProxyInfo(); + expect(info).toBeDefined(); + expect(info!.url).toMatch(apifyProxyUrlPattern); + expect(info!.groups).toEqual(groups); + expect(info!.countryCode).toBe(countryCode); + expect(info!.password).toBe(password); + expect(info!.hostname).toBe(hostname); + expect(info!.port).toBe(String(port)); + expect(info!.username).toMatch( + /^groups-GROUP1\+GROUP2,session-[A-Za-z0-9]+,country-CZ$/, ); }); - test('newProxyInfo() works with special characters', async () => { + test('newProxyInfo() works with custom proxyUrls and special characters', async () => { const url = 'http://user%40name:pass%40word@proxy.com:1111'; const proxyConfiguration = new ProxyConfiguration({ proxyUrls: [url] }); - const proxyInfo = { - sessionId: `${sessionId}`, + expect(await proxyConfiguration.newProxyInfo()).toEqual({ url, username: 'user@name', password: 'pass@word', hostname: 'proxy.com', port: '1111', - }; - expect(await proxyConfiguration.newProxyInfo(sessionId)).toEqual( - proxyInfo, - ); + }); }); test('actor UI input schema should work', () => { @@ -168,37 +166,6 @@ describe('ProxyConfiguration', () => { expect(() => new ProxyConfiguration({ countryCode: 1111 })).toThrow(); }); - test('newUrl() should throw on invalid session argument', async () => { - const proxyConfiguration = new ProxyConfiguration(); - await Promise.all([ - expect(async () => - proxyConfiguration.newUrl('a-b'), - ).rejects.toThrow(), - expect(proxyConfiguration.newUrl('a$b')).rejects.toThrow(), - // @ts-expect-error invalid input - expect(proxyConfiguration.newUrl({})).rejects.toThrow(), - // @ts-expect-error invalid input - expect(proxyConfiguration.newUrl(new Date())).rejects.toThrow(), - expect( - proxyConfiguration.newUrl(Array(51).fill('x').join('')), - ).rejects.toThrow(), - - expect(proxyConfiguration.newUrl('a_b')).resolves.not.toThrow(), - expect( - proxyConfiguration.newUrl('0.34252352'), - ).resolves.not.toThrow(), - expect(proxyConfiguration.newUrl('aaa~BBB')).resolves.not.toThrow(), - expect(proxyConfiguration.newUrl('a_1_b')).resolves.not.toThrow(), - expect(proxyConfiguration.newUrl('a_2')).resolves.not.toThrow(), - expect(proxyConfiguration.newUrl('a')).resolves.not.toThrow(), - expect(proxyConfiguration.newUrl('1')).resolves.not.toThrow(), - expect(proxyConfiguration.newUrl(123456)).resolves.not.toThrow(), - expect( - proxyConfiguration.newUrl(Array(50).fill('x').join('')), - ).resolves.not.toThrow(), - ]); - }); - test('should throw on invalid newUrlFunction', async () => { const newUrlFunction = () => { return 'http://proxy.com:1111*invalid_url'; @@ -243,7 +210,6 @@ describe('ProxyConfiguration', () => { 'http://proxy.com:4444', ); - // TODO enable strictNullChecks in tests // through newProxyInfo() expect((await proxyConfiguration.newProxyInfo())?.url).toEqual( 'http://proxy.com:3333', @@ -256,46 +222,6 @@ describe('ProxyConfiguration', () => { ); }); - test('async newUrlFunction should work correctly', async () => { - const customUrls = [ - 'http://proxy.com:1111', - 'http://proxy.com:2222', - 'http://proxy.com:3333', - 'http://proxy.com:4444', - 'http://proxy.com:5555', - 'http://proxy.com:6666', - ]; - const newUrlFunction = async () => { - await sleep(5); - return customUrls.pop() ?? null; - }; - const proxyConfiguration = new ProxyConfiguration({ - newUrlFunction, - }); - - // through newUrl() - expect(await proxyConfiguration.newUrl()).toEqual( - 'http://proxy.com:6666', - ); - expect(await proxyConfiguration.newUrl()).toEqual( - 'http://proxy.com:5555', - ); - expect(await proxyConfiguration.newUrl()).toEqual( - 'http://proxy.com:4444', - ); - - // through newProxyInfo() - expect((await proxyConfiguration.newProxyInfo())!.url).toEqual( - 'http://proxy.com:3333', - ); - expect((await proxyConfiguration.newProxyInfo())!.url).toEqual( - 'http://proxy.com:2222', - ); - expect((await proxyConfiguration.newProxyInfo())!.url).toEqual( - 'http://proxy.com:1111', - ); - }); - describe('With proxyUrls options', () => { test('should rotate custom URLs correctly', async () => { const proxyConfiguration = new ProxyConfiguration({ @@ -347,62 +273,6 @@ describe('ProxyConfiguration', () => { ); }); - test('should rotate custom URLs with sessions correctly', async () => { - const sessions = [ - 'sesssion_01', - 'sesssion_02', - 'sesssion_03', - 'sesssion_04', - 'sesssion_05', - 'sesssion_06', - ]; - const proxyConfiguration = new ProxyConfiguration({ - proxyUrls: [ - 'http://proxy.com:1111', - 'http://proxy.com:2222', - 'http://proxy.com:3333', - ], - }); - - // @ts-expect-error TODO private property? - const { proxyUrls } = proxyConfiguration; - // should use same proxy URL - expect(await proxyConfiguration.newUrl(sessions[0])).toEqual( - proxyUrls![0], - ); - expect(await proxyConfiguration.newUrl(sessions[0])).toEqual( - proxyUrls![0], - ); - expect(await proxyConfiguration.newUrl(sessions[0])).toEqual( - proxyUrls![0], - ); - - // should rotate different proxies - expect(await proxyConfiguration.newUrl(sessions[1])).toEqual( - proxyUrls![1], - ); - expect(await proxyConfiguration.newUrl(sessions[2])).toEqual( - proxyUrls![2], - ); - expect(await proxyConfiguration.newUrl(sessions[3])).toEqual( - proxyUrls![0], - ); - expect(await proxyConfiguration.newUrl(sessions[4])).toEqual( - proxyUrls![1], - ); - expect(await proxyConfiguration.newUrl(sessions[5])).toEqual( - proxyUrls![2], - ); - - // should remember already used session - expect(await proxyConfiguration.newUrl(sessions[1])).toEqual( - proxyUrls![1], - ); - expect(await proxyConfiguration.newUrl(sessions[3])).toEqual( - proxyUrls![0], - ); - }); - test('should throw cannot combine custom proxies with Apify Proxy', async () => { const proxyUrls = [ 'http://proxy.com:1111', @@ -485,81 +355,17 @@ describe('ProxyConfiguration', () => { } }); }); - - describe('With tieredProxyUrls', () => { - test('proxy configuration accepts the tiered urls (Crawlee style)', async () => { - const proxyConfiguration = new ProxyConfiguration({ - tieredProxyUrls: [ - ['http://proxy.com:1111'], - ['http://proxy.com:2222'], - ['http://proxy.com:3333'], - ['http://proxy.com:4444'], - ], - }); - - // through newUrl() - expect( - await proxyConfiguration.newUrl('abc', { - request: new Request({ url: 'http://example.com' }) as any, - }), - ).toEqual('http://proxy.com:1111'); - - // through newProxyInfo() - expect( - (await proxyConfiguration.newProxyInfo('abc', { - request: new Request({ - url: 'http://example.com', - }) as any, - }))!.url, - ).toEqual('http://proxy.com:1111'); - }); - - test('shorthand tieredProxyConfig gets correctly expanded', async () => { - const proxyConfiguration = new ProxyConfiguration({ - password: 'password', - countryCode: 'DE', - tieredProxyConfig: [ - { - groups: ['GROUP1'], - countryCode: 'CZ', - }, - { - groups: ['GROUP2'], - countryCode: 'US', - }, - { - groups: ['GROUP3', 'GROUP4'], - }, - { - groups: ['GROUP3', 'GROUP4'], - countryCode: undefined, - }, - ], - }); - - // eslint-disable-next-line dot-notation - expect(proxyConfiguration['tieredProxyUrls']).toEqual([ - [ - 'http://groups-GROUP1,country-CZ:password@proxy.apify.com:8000', - ], - [ - 'http://groups-GROUP2,country-US:password@proxy.apify.com:8000', - ], - [ - 'http://groups-GROUP3+GROUP4,country-DE:password@proxy.apify.com:8000', - ], - ['http://groups-GROUP3+GROUP4:password@proxy.apify.com:8000'], - ]); - }); - }); }); describe('Actor.createProxyConfiguration()', () => { const userData = { proxy: { password } }; + beforeEach(() => { + resetGlobalState(); + }); + test('should work with all options', async () => { const status = { connected: true }; - const proxyUrl = proxyUrlNoSession; const url = 'http://proxy.apify.com/?format=json'; gotScrapingSpy.mockResolvedValueOnce({ body: status } as any); @@ -580,7 +386,7 @@ describe('Actor.createProxyConfiguration()', () => { expect(gotScrapingSpy).toBeCalledWith({ url, - proxyUrl, + proxyUrl: expect.stringMatching(apifyProxyUrlPattern), timeout: { request: 4000 }, responseType: 'json', }); @@ -704,7 +510,11 @@ describe('Actor.createProxyConfiguration()', () => { await Actor.createProxyConfiguration(); expect(gotScrapingSpy).toBeCalledWith({ url: `${process.env.APIFY_PROXY_STATUS_URL}/?format=json`, - proxyUrl: `http://auto:${password}@${process.env.APIFY_PROXY_HOSTNAME}:8000`, + proxyUrl: expect.stringMatching( + new RegExp( + `^http://session-[A-Za-z0-9]+:${password}@${process.env.APIFY_PROXY_HOSTNAME}:8000$`, + ), + ), responseType: 'json', timeout: { request: 4000, @@ -713,71 +523,4 @@ describe('Actor.createProxyConfiguration()', () => { gotScrapingSpy.mockRestore(); }); - - describe('With tieredProxyUrls', () => { - test('proxy configuration accepts the tiered urls (Crawlee style)', async () => { - const proxyConfiguration = await Actor.createProxyConfiguration({ - tieredProxyUrls: [ - ['http://proxy.com:1111'], - ['http://proxy.com:2222'], - ['http://proxy.com:3333'], - ['http://proxy.com:4444'], - ], - }); - - // through newUrl() - expect( - await proxyConfiguration!.newUrl('abc', { - request: new Request({ url: 'http://example.com' }) as any, - }), - ).toEqual('http://proxy.com:1111'); - - // through newProxyInfo() - expect( - (await proxyConfiguration!.newProxyInfo('abc', { - request: new Request({ - url: 'http://example.com', - }) as any, - }))!.url, - ).toEqual('http://proxy.com:1111'); - }); - - test('shorthand tieredProxyConfig gets correctly expanded', async () => { - const proxyConfiguration = await Actor.createProxyConfiguration({ - password: 'password', - countryCode: 'DE', - tieredProxyConfig: [ - { - groups: ['GROUP1'], - countryCode: 'CZ', - }, - { - groups: ['GROUP2'], - countryCode: 'US', - }, - { - groups: ['GROUP3', 'GROUP4'], - }, - { - groups: ['GROUP3', 'GROUP4'], - countryCode: undefined, - }, - ], - }); - - // eslint-disable-next-line dot-notation - expect(proxyConfiguration!['tieredProxyUrls']).toEqual([ - [ - 'http://groups-GROUP1,country-CZ:password@proxy.apify.com:8000', - ], - [ - 'http://groups-GROUP2,country-US:password@proxy.apify.com:8000', - ], - [ - 'http://groups-GROUP3+GROUP4,country-DE:password@proxy.apify.com:8000', - ], - ['http://groups-GROUP3+GROUP4:password@proxy.apify.com:8000'], - ]); - }); - }); });