This guide covers the monitoring and observability setup for the Scripture App, including synthetic monitoring with warm-up strategies for cold starts.
- Location:
backend/app/main.py - Endpoint:
/health - Features:
- Database connection warm-up
- Volume count verification
- Warm-up status reporting
- Detailed health information
- Location:
.github/workflows/synthetic-monitoring.yml - Schedule: Every 15 minutes
- Features:
- 5-minute warm-up period
- Core endpoint testing
- Performance metrics collection
- Response validation
- Manual trigger support
- Location:
scripts/monitor.py - Features:
- Local testing capabilities
- Configurable warm-up time
- Performance metrics
- JSON output support
- Command-line interface
-
Start the backend server:
cd backend uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 -
Run the monitoring script:
# Basic monitoring with 5-minute warm-up python scripts/monitor.py # Test without warm-up python scripts/monitor.py --warm-up false # Custom warm-up time (2 minutes) python scripts/monitor.py --wait 2 # Test against production URL python scripts/monitor.py --url https://scriptures-fast-api.onrender.com # Save results to file python scripts/monitor.py --output results.json
-
Update the API URL in
.github/workflows/synthetic-monitoring.yml:echo "API_URL=https://scriptures-fast-api.onrender.com" >> $GITHUB_ENV -
Enable GitHub Actions in your repository settings
-
Monitor the workflow:
- Go to Actions tab in GitHub
- Check "Synthetic Monitoring" workflow
- Runs every 15 minutes automatically
{
"status": "healthy",
"warmed_up": true,
"database": "connected",
"volumes_count": 5,
"timestamp": "2025-01-05T00:00:00Z"
}- Response time tracking
- Database connection status
- Volume count verification
- Error rate monitoring
- Health check failures
- Slow response times (>5s for health, >10s for random)
- Database connection issues
- Endpoint availability
- Render free tier has cold starts
- Services scale down after inactivity
- First request can take 30+ seconds
- Subsequent requests are fast
- Initial Request: Triggers cold start
- 5-minute Wait: Allows full warm-up
- Testing: All endpoints tested
- Validation: Response validation
- Metrics: Performance data collection
- Maximum job time: 6 hours (free tier)
- Cron frequency: Minimum 5 minutes
- Resource usage: 2,000 minutes/month (free tier)
- Our setup: 15-minute intervals = 96 runs/day = 2,880 minutes/month
# In .github/workflows/synthetic-monitoring.yml
sleep 300 # 5 minutes
# Change to: sleep 180 # 3 minutes# Add to the test section
- name: Test additional endpoints
run: |
curl -s "$API_URL/api/scriptures/reference/John/3"# Add Slack/Discord notifications
- name: Notify on failure
if: failure()
run: |
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"Scripture App monitoring failed!"}' \
$SLACK_WEBHOOK_URL- Uptime: Service availability
- Response Times: p50, p95, p99
- Cold Start Frequency: How often services scale down
- Error Rates: By endpoint and type
- User Impact: Cold start vs warm performance
-- Average response time by endpoint
SELECT endpoint, AVG(response_time)
FROM monitoring_metrics
GROUP BY endpoint
-- Cold start detection
SELECT COUNT(*)
FROM monitoring_metrics
WHERE response_time > 30-
Cold Start Timeouts:
- Increase timeout values in monitoring
- Extend warm-up period
- Consider upgrading to paid tier
-
GitHub Actions Failures:
- Check API URL is correct
- Verify endpoint availability
- Review workflow logs
-
Local Script Issues:
- Install requests:
pip install requests - Check backend is running
- Verify URL accessibility
- Install requests:
# Test health endpoint manually
curl -v http://localhost:8000/health
# Check GitHub Actions logs
# Go to Actions tab in GitHub repository
# Test monitoring script with verbose output
python scripts/monitor.py --url http://localhost:8000- Add Prometheus metrics
- Set up Grafana dashboards
- Implement distributed tracing
- Add log aggregation
- Define SLI/SLOs
- Implement error budgets
- Create incident response playbooks
- Set up chaos engineering
Last Updated: January 2025 Maintained By: SRE Team Review Schedule: Monthly