Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion MIGRATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ npm install apify playwright

If you want to make use of Playwright on the Apify Platform, you need to use a Docker image
that supports Playwright. We've created them for you, so head over to the new
[Docker image guide](https://sdk.apify.com/docs/guides/docker-images) and pick the one
[Docker image guide](https://docs.apify.com/sdk/js/docs/concepts/docker-images) and pick the one
that best suits your needs.

Note that your `package.json` **MUST** include `puppeteer` and/or `playwright` as dependencies.
Expand Down
30 changes: 15 additions & 15 deletions docs/02_concepts/01_actor_lifecycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ variable to your API token.

### Log in with Configuration

Another option is to use the [`Configuration`](https://sdk.apify.com/api/apify/class/Configuration) instance and set your api token there.
Another option is to use the <ApiLink to="class/Configuration">`Configuration`</ApiLink> instance and set your api token there.

```javascript
import { Actor } from 'apify';
Expand Down Expand Up @@ -111,13 +111,13 @@ apify run

For running Crawlee code as an actor on [Apify platform](https://apify.com/actors) you should either:

- use a combination of [`Actor.init()`](https://sdk.apify.com/api/apify/class/Actor#init) and [`Actor.exit()`](https://sdk.apify.com/api/apify/class/Actor#exit) functions;
- or wrap it into [`Actor.main()`](https://sdk.apify.com/api/apify/class/Actor#main) function.
- use a combination of <ApiLink to="class/Actor#init">`Actor.init()`</ApiLink> and <ApiLink to="class/Actor#exit">`Actor.exit()`</ApiLink> functions;
- or wrap it into <ApiLink to="class/Actor#main">`Actor.main()`</ApiLink> function.

:::info NOTE

- Adding [`Actor.init()`](https://sdk.apify.com/api/apify/class/Actor#init) and [`Actor.exit()`](https://sdk.apify.com/api/apify/class/Actor#exit) to your code are the only two important things needed to run it on Apify platform as an actor. `Actor.init()` is needed to initialize your actor (e.g. to set the correct storage implementation), while without `Actor.exit()` the process will simply never stop.
- [`Actor.main()`](https://sdk.apify.com/api/apify/class/Actor#main) is an alternative to `Actor.init()` and `Actor.exit()` as it calls both behind the scenes.
- Adding <ApiLink to="class/Actor#init">`Actor.init()`</ApiLink> and <ApiLink to="class/Actor#exit">`Actor.exit()`</ApiLink> to your code are the only two important things needed to run it on Apify platform as an actor. `Actor.init()` is needed to initialize your actor (e.g. to set the correct storage implementation), while without `Actor.exit()` the process will simply never stop.
- <ApiLink to="class/Actor#main">`Actor.main()`</ApiLink> is an alternative to `Actor.init()` and `Actor.exit()` as it calls both behind the scenes.
:::

Let's look at the `CheerioCrawler` example from the <CrawleeLink to="docs/quick-start">Quick Start</CrawleeLink> guide:
Expand Down Expand Up @@ -204,22 +204,22 @@ There are several things worth mentioning here.

To simplify access to the _default_ storages, instead of using the helper functions of respective storage classes, you could use:

- [`Actor.setValue()`](https://sdk.apify.com/api/apify/class/Actor#setValue), [`Actor.getValue()`](https://sdk.apify.com/api/apify/class/Actor#getValue), [`Actor.getInput()`](https://sdk.apify.com/api/apify/class/Actor#getInput) for `Key-Value Store`
- [`Actor.pushData()`](https://sdk.apify.com/api/apify/class/Actor#pushData) for `Dataset`
- <ApiLink to="class/Actor#setValue">`Actor.setValue()`</ApiLink>, <ApiLink to="class/Actor#getValue">`Actor.getValue()`</ApiLink>, <ApiLink to="class/Actor#getInput">`Actor.getInput()`</ApiLink> for `Key-Value Store`
- <ApiLink to="class/Actor#pushData">`Actor.pushData()`</ApiLink> for `Dataset`

### Using platform storage in a local actor

When you plan to use the platform storage while developing and running your actor locally, you should use [`Actor.openKeyValueStore()`](https://sdk.apify.com/api/apify/class/Actor#openKeyValueStore), [`Actor.openDataset()`](https://sdk.apify.com/api/apify/class/Actor#openDataset) and [`Actor.openRequestQueue()`](https://sdk.apify.com/api/apify/class/Actor#openRequestQueue) to open the respective storage.
When you plan to use the platform storage while developing and running your actor locally, you should use <ApiLink to="class/Actor#openKeyValueStore">`Actor.openKeyValueStore()`</ApiLink>, <ApiLink to="class/Actor#openDataset">`Actor.openDataset()`</ApiLink> and <ApiLink to="class/Actor#openRequestQueue">`Actor.openRequestQueue()`</ApiLink> to open the respective storage.

Using each of these methods allows to pass the [`OpenStorageOptions`](https://sdk.apify.com/api/apify/interface/OpenStorageOptions) as a second argument, which has only one optional property: [`forceCloud`](https://sdk.apify.com/api/apify/interface/OpenStorageOptions#forceCloud). If set to `true` - cloud storage will be used instead of the folder on the local disk.
Using each of these methods allows to pass the <ApiLink to="interface/OpenStorageOptions">`OpenStorageOptions`</ApiLink> as a second argument, which has only one optional property: <ApiLink to="interface/OpenStorageOptions#forceCloud">`forceCloud`</ApiLink>. If set to `true` - cloud storage will be used instead of the folder on the local disk.

:::note
If you don't plan to force usage of the platform storages when running the actor locally, there is no need to use the [`Actor`](https://sdk.apify.com/api/apify/class/Actor) class for it. The Crawlee variants <ApiLink to="apify/class/KeyValueStore#open">`KeyValueStore.open()`</ApiLink>, <ApiLink to="apify/class/Dataset#open">`Dataset.open()`</ApiLink> and <ApiLink to="apify/class/RequestQueue#open">`RequestQueue.open()`</ApiLink> will work the same.
If you don't plan to force usage of the platform storages when running the actor locally, there is no need to use the <ApiLink to="class/Actor">`Actor`</ApiLink> class for it. The Crawlee variants <ApiLink to="class/KeyValueStore#open">`KeyValueStore.open()`</ApiLink>, <ApiLink to="class/Dataset#open">`Dataset.open()`</ApiLink> and <ApiLink to="class/RequestQueue#open">`RequestQueue.open()`</ApiLink> will work the same.
:::

### Getting public url of an item in the platform storage

If you need to share a link to some file stored in a Key-Value Store on Apify Platform, you can use [`getPublicUrl()`](https://sdk.apify.com/api/apify/class/KeyValueStore#getPublicUrl) method. It accepts only one parameter: `key` - the key of the item you want to share.
If you need to share a link to some file stored in a Key-Value Store on Apify Platform, you can use <ApiLink to="class/KeyValueStore#getPublicUrl">`getPublicUrl()`</ApiLink> method. It accepts only one parameter: `key` - the key of the item you want to share.

```js
import { KeyValueStore } from 'apify';
Expand All @@ -232,7 +232,7 @@ const url = store.getPublicUrl('your-file');

### Exporting dataset data

When the <ApiLink to="apify/class/Dataset">`Dataset`</ApiLink> is stored on the [Apify platform](https://apify.com/actors), you can export its data to the following formats: HTML, JSON, CSV, Excel, XML and RSS. The datasets are displayed on the actor run details page and in the [Storage](https://console.apify.com/storage) section in the Apify Console. The actual data is exported using the [Get dataset items](https://apify.com/docs/api/v2#/reference/datasets/item-collection/get-items) Apify API endpoint. This way you can easily share the crawling results.
When the <ApiLink to="class/Dataset">`Dataset`</ApiLink> is stored on the [Apify platform](https://apify.com/actors), you can export its data to the following formats: HTML, JSON, CSV, Excel, XML and RSS. The datasets are displayed on the actor run details page and in the [Storage](https://console.apify.com/storage) section in the Apify Console. The actual data is exported using the [Get dataset items](https://apify.com/docs/api/v2#/reference/datasets/item-collection/get-items) Apify API endpoint. This way you can easily share the crawling results.

**Related links**

Expand Down Expand Up @@ -312,7 +312,7 @@ const proxyConfiguration = await Actor.createProxyConfiguration();
const proxyUrl = await proxyConfiguration.newUrl();
```

Note that unlike using your own proxies in Crawlee, you shouldn't use the constructor to create <ApiLink to="apify/class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> instance. For using Apify Proxy you should create an instance using the [`Actor.createProxyConfiguration()`](https://sdk.apify.com/api/apify/class/Actor#createProxyConfiguration) function instead.
Note that unlike using your own proxies in Crawlee, you shouldn't use the constructor to create <ApiLink to="class/ProxyConfiguration">`ProxyConfiguration`</ApiLink> instance. For using Apify Proxy you should create an instance using the <ApiLink to="class/Actor#createProxyConfiguration">`Actor.createProxyConfiguration()`</ApiLink> function instead.

### Apify Proxy Configuration

Expand Down Expand Up @@ -344,8 +344,8 @@ essentially has two modes: Apify Proxy or Own (third party) proxy.

The difference is easy to remember.

- If you're using your own proxies - you should create an instance with the ProxyConfiguration <ApiLink to="apify/class/ProxyConfiguration#constructor">`constructor`</ApiLink> function based on the provided <ApiLink to="apify/interface/ProxyConfigurationOptions">`ProxyConfigurationOptions`</ApiLink>.
- If you are planning to use Apify Proxy - you should create an instance using the [`Actor.createProxyConfiguration()`](https://sdk.apify.com/api/apify/class/Actor#createProxyConfiguration) function. <ApiLink to="apify/interface/ProxyConfigurationOptions#proxyUrls">`ProxyConfigurationOptions.proxyUrls`</ApiLink> and <ApiLink to="apify/interface/ProxyConfigurationOptions#newUrlFunction">`ProxyConfigurationOptions.newUrlFunction`</ApiLink> enable use of your custom proxy URLs, whereas all the other options are there to configure Apify Proxy.
- If you're using your own proxies - you should create an instance with the ProxyConfiguration <ApiLink to="class/ProxyConfiguration#constructor">`constructor`</ApiLink> function based on the provided <ApiLink to="interface/ProxyConfigurationOptions">`ProxyConfigurationOptions`</ApiLink>.
- If you are planning to use Apify Proxy - you should create an instance using the <ApiLink to="class/Actor#createProxyConfiguration">`Actor.createProxyConfiguration()`</ApiLink> function. <ApiLink to="interface/ProxyConfigurationOptions#proxyUrls">`ProxyConfigurationOptions.proxyUrls`</ApiLink> and <ApiLink to="interface/ProxyConfigurationOptions#newUrlFunction">`ProxyConfigurationOptions.newUrlFunction`</ApiLink> enable use of your custom proxy URLs, whereas all the other options are there to configure Apify Proxy.

**Related links**

Expand Down
2 changes: 1 addition & 1 deletion docs/03_guides/cheerio_crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ const crawler = new CheerioCrawler({

// This function will be called for each URL to crawl.
// It accepts a single parameter, which is an object with options as:
// https://sdk.apify.com/docs/typedefs/cheerio-crawler-options#handlepagefunction
// https://crawlee.dev/js/api/cheerio-crawler/interface/CheerioCrawlerOptions#requestHandler
// We use for demonstration only 2 of them:
// - request: an instance of the Request class with information such as URL and HTTP method
// - $: the cheerio object containing parsed HTML
Expand Down
2 changes: 1 addition & 1 deletion docs/04_upgrading/upgrading_v1.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ npm install apify playwright

If you want to make use of Playwright on the Apify Platform, you need to use a Docker image
that supports Playwright. We've created them for you, so head over to the new
[Docker image guide](https://sdk.apify.com/docs/guides/docker-images) and pick the one
[Docker image guide](https://docs.apify.com/sdk/js/docs/concepts/docker-images) and pick the one
that best suits your needs.

Note that your `package.json` **MUST** include `puppeteer` and/or `playwright` as dependencies.
Expand Down
2 changes: 1 addition & 1 deletion src/proxy_configuration.ts
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ export class ProxyConfiguration extends CoreProxyConfiguration {
if (proxyUrls && proxyUrls.some((url) => url?.includes('apify.com'))) {
this.log.warning(
'Some Apify proxy features may work incorrectly. Please consider setting up Apify properties instead of `proxyUrls`.\n' +
'See https://sdk.apify.com/docs/guides/proxy-management#apify-proxy-configuration',
'See https://docs.apify.com/sdk/js/docs/concepts/proxy-management#apify-proxy-configuration',
);
}
}
Expand Down
2 changes: 1 addition & 1 deletion website/tools/utils/externalLink.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ const { parse } = require('url');

const visit = import('unist-util-visit').then((m) => m.visit);

const internalUrls = ['sdk.apify.com'];
const internalUrls = ['docs.apify.com'];

/**
* @param {import('url').UrlWithStringQuery} href
Expand Down
Loading