A Flask-based web scraping service that monitors websites for content changes and sends notifications via Discord. Currently configured to monitor Amazon Flex recruitment pages for availability changes. Designed to run on Google Cloud Run with automated scheduling.
The application performs the following:
- Web Scraping: Uses Selenium WebDriver to navigate to specified websites
- Content Monitoring: Searches for specific text that indicates changes in content or availability
- Discord Notifications: Sends status updates to a Discord channel via webhooks
- HTTP API: Exposes a POST endpoint that can be triggered manually or via scheduled jobs
- Headless Chrome browser automation
- Multiple CSS/XPath selector fallbacks for robust element detection
- Intelligent waiting for dynamic content to load
- Discord webhook integration with custom branding
- Environment-based configuration for easy customisation
- Health monitoring and error reporting
This implementation specifically monitors Amazon Flex recruitment pages to detect when Amazon starts accepting new delivery partners, but the framework can be adapted for any website content monitoring.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Cloud Scheduler │───▶│ Cloud Run │───▶│ Discord │
│ (Cron Job) │ │ (Flask App) │ │ (Webhook) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Target Website │
│ (Amazon Flex) │
└─────────────────┘
| Variable | Description | Required |
|---|---|---|
WEBSITE_URL |
The website URL to monitor | Yes |
SEARCH_TEXT |
Text to search for on the page | Yes |
DISCORD_WEBHOOK_URL |
Discord webhook URL for notifications | Yes |
PORT |
Port for the Flask application | No (defaults to 8080) |
- Python 3.13+
- Chrome browser (for Selenium)
- Docker (for containerisation)
-
Clone the repository
git clone https://github.com/yourusername/web-scraper-discord-notification.git cd web-scraper-discord-notification -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
cp .env.example .env # Edit .env with your valuesExample
.envfor Amazon Flex monitoring:WEBSITE_URL=https://flex.amazon.co.uk/recruiting-cities SEARCH_TEXT=We are not looking for more delivery partners at the moment DISCORD_WEBHOOK_URL=your-discord-webhook-url -
Run locally
python main.py
-
Test the endpoint
curl -X POST http://localhost:8080/
Use the provided launch configuration:
- Open VS Code in the project directory
- Set breakpoints in your code
- Press F5 and select "Python: Flask"
- The debugger will attach to the Flask application
docker build -t web-scraper-discord-notification .docker run -p 8080:8080 \
-e WEBSITE_URL="https://flex.amazon.co.uk/recruiting-cities" \
-e SEARCH_TEXT="We are not looking for more delivery partners" \
-e DISCORD_WEBHOOK_URL="your-discord-webhook-url" \
web-scraper-discord-notificationversion: '3.8'
services:
web-scraper:
build: .
ports:
- "8080:8080"
environment:
- WEBSITE_URL=https://flex.amazon.co.uk/recruiting-cities
- SEARCH_TEXT=We are not looking for more delivery partners
- DISCORD_WEBHOOK_URL=your-discord-webhook-url- Google Cloud SDK installed and configured
- Docker configured for Google Cloud
- Billing enabled on your Google Cloud project
# Configure Docker for GCP
gcloud auth configure-docker
# Build and tag the image
docker build -t gcr.io/[PROJECT-ID]/web-scraper-discord-notification .
# Push to Container Registry
docker push gcr.io/[PROJECT-ID]/web-scraper-discord-notificationgcloud run deploy web-scraper-discord-notification \
--image gcr.io/[PROJECT-ID]/web-scraper-discord-notification \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars WEBSITE_URL="https://flex.amazon.co.uk/recruiting-cities" \
--set-env-vars SEARCH_TEXT="We are not looking for more delivery partners" \
--set-env-vars DISCORD_WEBHOOK_URL="your-discord-webhook-url" \
--memory 512Mi \
--cpu 1 \
--timeout 900Create a Cloud Scheduler job to trigger the service regularly:
gcloud scheduler jobs create http web-scraper-check \
--schedule="*/15 * * * *" \
--uri=[CLOUD-RUN-SERVICE-URL] \
--http-method=POST \
--location=us-central1Triggers a website check and sends Discord notification.
Response:
200 OK: Check completed successfully400 Bad Request: Missing required environment variables500 Internal Server Error: Error during execution
Example Response:
🔍 Text 'We are not looking for more delivery partners' not found in target element on https://flex.amazon.co.uk/recruiting-cities
gcloud logs read --project=[PROJECT-ID] --filter="resource.type=cloud_run_revision"gcloud run services describe web-scraper-discord-notification --region=us-central1-
Element not found: The website structure may have changed. Update the CSS selectors in the code.
-
Timeout errors: Increase the WebDriverWait timeout or add more explicit waits.
-
Discord webhook not working: Verify the webhook URL is correct and the channel exists.
-
Chrome driver issues: Ensure Chrome is installed in the container and WebDriver is up to date.
Run with verbose logging:
FLASK_DEBUG=1 python main.pyTo adapt this scraper for other websites:
- Update Environment Variables: Change
WEBSITE_URLandSEARCH_TEXTto your target site and content - Modify Selectors: Update the CSS selectors in
main.pyto match your target website's structure - Adjust Wait Conditions: Some sites may need different waiting strategies
- Update Discord Messages: Customise the notification messages and branding
- Store sensitive environment variables in Google Secret Manager
- Use IAM roles with minimal required permissions
- Enable Cloud Run authentication if the service should not be public
- Regularly update dependencies for security patches
- Set appropriate CPU and memory limits
- Use Cloud Scheduler instead of keeping the service running continuously
- Set reasonable request timeouts
- Monitor usage with Cloud Monitoring
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
[Add your license here]