Skip to content

DbUrlList now honors recrawlInMs option.#43

Open
hjr3 wants to merge 1 commit intobrendonboshell:masterfrom
hjr3:issue-49
Open

DbUrlList now honors recrawlInMs option.#43
hjr3 wants to merge 1 commit intobrendonboshell:masterfrom
hjr3:issue-49

Conversation

@hjr3
Copy link
Copy Markdown
Contributor

@hjr3 hjr3 commented Nov 9, 2019

Fixes #40

// seconds. This ensures the order we crawl URLs is random; otherwise, if
// we parse a sitemap, we could get stuck crawling one host for hours.
delay = - Math.random() * YEAR_MS;
delay = - Math.random() * this._recrawlInMs;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this has the intended effect. Notice that delay is negative here. This is simply to randomize new URLs that come onto the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to periodically crawl again

2 participants