Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 2.06 KB

File metadata and controls

47 lines (29 loc) · 2.06 KB

Collection of Java-based Web crawlers

Java CI with Maven

Prerequisites

  • Maven 3
  • JDK 21

How to build

mvn clean install

Common crawler functionality

  • Your crawler should extend WebCrawler base crawler class
  • DTO class which describes collected data should implement CrawlerData marker interface

Crawler for Orthodox torrent tracker pravtor.ru

Check PravtorRuWebCrawler for details

To make search - use run-search script in pravtor.ru-crawler folder.
Collected data will be placed into result.xls file in sandbox folder

Crawler for vacancies aggregator rabota.by

Check RabotaByWebCrawler for details

To make search - use run-search script in rabota.by-crawler folder.

Crawler for Onlíner CPU catalog (AM4)

Check OnlinerByCpuCrawler for details. It reads the JSON-LD ItemList from catalog pages (filters: socket_cpu[0]=am4, price[from]=1).

To run — use run-search in the onliner.by-crawler folder after mvn package (output JSON path and optional args are set in the script).

Video with description of the project

YouTube link