Skip to content

feat: production readiness - 7 critical fixes before prod#2

Draft
Ageree wants to merge 1 commit into
mainfrom
claude/payment-service-production-HobXB
Draft

feat: production readiness - 7 critical fixes before prod#2
Ageree wants to merge 1 commit into
mainfrom
claude/payment-service-production-HobXB

Conversation

@Ageree
Copy link
Copy Markdown
Owner

@Ageree Ageree commented Apr 15, 2026

Summary

All 7 production-readiness items from the checklist, implemented and tested:

  1. Fix pending_withdrawals status column — migration converts VARCHAR(32)withdrawalstatus enum; runtime fallback now creates the enum type before the table
  2. Rate limits on deposit subaddress creation — per-agent rate limiting on POST /v2/balance/deposit to prevent wallet RPC abuse
  3. DepositMonitor alerting — consecutive poll failures fire alerts through HealthMonitor → Slack/Telegram/Discord webhooks (added Slack support)
  4. DB backup scriptscripts/backup_db.sh with pg_dump, gzip, rotation, S3 upload, cron install, and restore modes
  5. Withdraw API tests — 10 end-to-end tests for POST /v2/balance/withdraw with agent auth (not admin), covering success/failure/validation/idempotency
  6. Test agent cleanupPOST /v2/admin/cleanup-test-agents endpoint with pattern matching, dry-run, and balance safety guard
  7. WEB_CONCURRENCY + worker guard — default bumped to 2 workers, PG advisory lock leader election ensures background tasks only run on primary worker, --max-requests 1000 for worker recycling

Test plan

  • All 10 new test_withdraw_api.py tests pass (auth, validation, balance deduction, idempotency)
  • All 5 existing test_withdrawal_atomic.py tests pass
  • All 3 test_withdraw_self_send.py tests pass
  • All 21 test_balance.py tests pass
  • Verify POST /v2/admin/cleanup-test-agents with dry_run=true on staging
  • Set ALERT_WEBHOOK_URL on staging and verify Slack/Telegram alerts fire
  • Run scripts/backup_db.sh against staging DB and verify restore
  • Deploy with WEB_CONCURRENCY=2 and verify only one worker runs DepositMonitor

https://claude.ai/code/session_013YXpQQTzpN8ueQWUuQArAz

1. Fix pending_withdrawals status column: migration converts VARCHAR(32)
   to proper withdrawalstatus enum, fallback _ensure_pending_withdrawals()
   now creates the enum type before the table.

2. Rate limits on deposit subaddress creation: per-agent rate limiting
   on POST /v2/balance/deposit to prevent wallet RPC abuse.

3. DepositMonitor alerting: consecutive poll failures now fire alerts
   through the HealthMonitor system, dispatching to Slack/Telegram/Discord
   webhooks. Added Slack Incoming Webhook support alongside existing
   Telegram and Discord.

4. DB backup script: scripts/backup_db.sh with pg_dump, gzip compression,
   retention rotation, optional S3 upload, cron install mode, and restore.

5. Withdraw API tests: 10 end-to-end tests for POST /v2/balance/withdraw
   with agent auth (not admin), covering success, insufficient balance,
   auth rejection, validation, idempotency, and sequential withdrawals.

6. Test agent cleanup: POST /v2/admin/cleanup-test-agents endpoint with
   pattern matching, dry-run mode, and balance safety guard.

7. WEB_CONCURRENCY + worker guard: default bumped to 2 workers,
   PG advisory lock-based leader election ensures background tasks
   (DepositMonitor, escrow, SLA, recurring) only run on primary worker.
   Added --max-requests 1000 for worker recycling.

https://claude.ai/code/session_013YXpQQTzpN8ueQWUuQArAz
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41c0733c-1cd6-45a9-b1a8-3ed4c22d481c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/payment-service-production-HobXB

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Ageree added a commit that referenced this pull request Apr 27, 2026
Follow-up to fd6382b addressing 5-agent review of the PR1 commit.

Real bugs introduced by PR1, now fixed:

- F-5b — channel_signing.verify_channel_state regression (Agent#3 finding):
  PyNaCl's CryptoError is the parent of BadSignatureError, not the other way
  around. The narrowed `except BadSignatureError` left raw CryptoError
  uncaught, so a corrupt key would propagate as an unhandled 500 instead of
  cleanly rejecting the channel state. Add explicit `except CryptoError` and
  drop the redundant `binascii.Error` (it's a ValueError subclass in
  Python 3). Demote the BadSignatureError branch to DEBUG since it's the
  expected reject path.

- F-9c — validate_session AttributeError on non-dict JSON (Agent#2 #1):
  json.loads accepts any valid JSON, not just objects. A stray "42" or
  "[1,2]" in the session payload would let the parse succeed but then
  `payload.get(...)` would AttributeError → 500. Guard with
  `isinstance(payload, dict)`.

- F-9d — short-circuit before compare_digest (Agent#2 #2):
  `if not client_ip or not hmac.compare_digest(...)` returns faster on a
  missing IP than on a wrong IP, leaking presence of binding via timing.
  Always pass strings (`client_ip or ""`) so compare_digest runs in every
  branch.

- F-9e — bound IP leaked in log (Agent#5 #1):
  The mismatch warning logged the bound IP in plaintext. Anyone with read
  access to Railway/Logtail logs would learn where each admin token was
  issued. Drop the IP from the log message.

- F-13b — _secure_shuffle in-place mutation (Agents #1, #5):
  Violated the project-wide "ALWAYS create new objects, NEVER mutate"
  rule from ~/.claude/rules/common/coding-style.md. Now returns a fresh
  list; call site simplified.

Documentation / config:

- F-9b — trusted_proxy_hosts: documented Railway requirement in
  sthrip/config.py and .env.railway.example. Without ``TRUSTED_PROXY_HOSTS=*``
  on Railway, request.client.host is the Railway edge IP, collapsing
  every admin session and rate-limit key to a single IP and bypassing
  F-9 binding. (Agent#3 finding.)

- F-15b — CORS_DEV_AUTOEXTEND documented in .env.railway.example.

- F-16 — _fix_pg_enums comment clarified: code "log+skips" unsafe entries,
  it does not raise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants