Skip to content
This repository was archived by the owner on Oct 2, 2025. It is now read-only.

zzha9494/practice-scrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapy Exercises

This repository is for practicing Scrapy, a free and open-source web-crawling framework, and thanks to the people who gave me this valuable opportunity.

Requirements

  • Python 3.8+
  • scrapy 2.11

Getting Started (For Windows)

  1. Create a Python virtual environment, which helps isolate the practice environment from the main environment and reduces the possibility of package conflicts.

    python -m venv my_scrapy_env
    
  2. Activate the virtual environment.

    my_scrapy_env/Scripts/activate
    
  3. Install dependencies.

    pip install -r requirements.txt
    

    This will install packages from requirements.txt:

    • scrapy
    • shub
    • scrapy-crawlera
    • google-cloud-storage
    • scrapy-sessions

    Please note that this project is initialized with Scrapy 2.11. Running scrapy startproject exercises with Scrapy 2.4 conflicts with other packages.

Usage

Please note the log level is set to INFO.

  1. Tackle World

    Inside the exercises folder, run:

    scrapy crawl tackleworldadelaide -O tackleworldadelaide.json
    

    This generates a json file containing products data from TackleWorld.

  2. Surfboard Empire

    Inside the exercises folder, run:

    scrapy crawl surfboardempire -O surfboardempire.json
    

    This generates a json file containing products data from Surfboard Empire.

  3. Regular Expressions

    Inside root folder, run:

    python regex.py
    

    Simply extract the numerical total number of products from an HTML elements.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages