We currently extract links from all pages that are on the same domain as the original URL that is passed to crawl.
This might be too narrow (for instance, a site may be spread over several subdomains) or too broad (for instance, somebody might be only interested in pages that are children of a particular URL).
We should find a way to allow the user to configure which pages to extract links from
We currently extract links from all pages that are on the same domain as the original URL that is passed to
crawl.This might be too narrow (for instance, a site may be spread over several subdomains) or too broad (for instance, somebody might be only interested in pages that are children of a particular URL).
We should find a way to allow the user to configure which pages to extract links from