Skip to content

Production Deployment Support: Docker Image Hardening + Monitoring #211

@leostar0412

Description

@leostar0412

Problem

The boost-data-collector is moving toward production deployment on GCP (staging was the Week 20 deliverable). The Docker image and runtime configuration need hardening for production readiness: the current Dockerfile and docker-compose configuration are optimized for development, not for a production environment handling credentials for six platforms (GitHub, Slack, Discord, Pinecone, YouTube, WG21). Production deployment requires a hardened container image, health check endpoints that reflect actual collector health (not just process liveness), and baseline monitoring to detect silent data gaps — the domain's most feared failure mode.

Acceptance Criteria

  • Harden Dockerfile: non-root user, minimal base image, pinned system package versions, .dockerignore excludes dev files
  • Add health check endpoint (or enhance existing) that reports: last successful collection timestamp per collector group, Celery worker status, database connectivity
  • Add structured logging output (JSON format) suitable for GCP Cloud Logging / Stackdriver ingestion
  • Add a docker-compose.prod.yml override (or document the production overrides) with resource limits, restart policies, and secret injection via environment
  • Pair with Daniel on the GCP Cloud Run / Cloud SQL deployment configuration
  • Verify the hardened image builds, passes smoke tests, and runs migrations successfully

Implementation Notes

Coordinate with Daniel (@snowfox1003), who owns the GCP staging deployment. Key areas:
(1) the Dockerfile currently runs as root — add USER nonroot;
(2) the gunicorn config should use --worker-class gthread or --worker-class uvicorn.workers.UvicornWorker for async support;
(3) Celery worker should have --max-tasks-per-child set to prevent memory leaks from long-running workers;
(4) the HEALTHCHECK Docker instruction should hit the health endpoint. The docker-compose.ci.yml already exists as a reference for test configuration.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions