Skip to content

sendMessageDraft 429 cascade during long streaming replies in DMs #117

@fitz123

Description

@fitz123

Summary

During long streaming replies in DM bindings (kind: "dm"), the bot generates large bursts of bot_telegram_api_errors_total{method="sendMessageDraft", error_code="429"} and bot_telegram_api_errors_total{method="sendChatAction", error_code="429"} entries. The user-visible outcome is fine — the final sendMessage always delivers — but the log noise is heavy and the Prometheus TelegramAPIErrors alert fires repeatedly during normal usage.

Today (over one day of moderately heavy DM usage) the bot logged 428 rate-limit lines, vs. 28-31 per day in the preceding baseline week. The single chat-volume spike drove a ~14× increase in error counter values, with no actual delivery failures observed.

Evidence

Counter snapshot (anonymized):

bot_telegram_api_errors_total{method="sendMessageDraft",error_code="429"} 378
bot_telegram_api_errors_total{method="sendChatAction",error_code="429"} 50

Sample log lines from one burst:

2026-05-15T19:10:29.607Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
2026-05-15T19:10:31.706Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
2026-05-15T19:10:31.758Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
...

Hour-by-hour burst counts on a high-volume day:

UTC hour rate-limit lines
09 17
12 67
15 19
16 71
17 145
18 43
19 66

16 burst clusters. Each cluster aligns 1:1 with the end of a single assistant turn whose final reply length exceeded ~2 KB. Linear correlation between final-reply size and burst size:

Final reply (chars) Avg rate-limit lines per turn
< 2,000 1–4
2,000–2,800 17–24
2,800–3,500 20–24
3,500–4,000 36–43
> 4,000 66 (max sample)

Root cause

Three multiplying factors:

  1. DRAFT_DEBOUNCE_MS = 300 (bot/src/stream-relay.ts:124) — while a reply streams, draft updates fire roughly every 300 ms = up to ~3.3 calls/sec per chat.
  2. Per-chat Telegram limit on draft API — empirically Telegram throttles sendMessageDraft to ~1 call/sec for the same chat_id, matching the documented general per-chat send-rate (Bot API FAQ "Broadcasting to users"). The official sendMessageDraft method spec does not document a draft-specific exemption.
  3. AUTO_RETRY_OPTIONS.maxRetryAttempts = 5 (bot/src/telegram-bot.ts:497-501) combined with @grammyjs/auto-retry honoring each retry_after (3-10 s) — every rate-limited logical sendMessageDraft call balloons into up to 5 additional retries, each logged.

For a 2.5 KB reply streaming over ~10-25 s the math is roughly 30-80 draft attempts × ~80% rejection rate × autoRetry 5× = the observed burst sizes.

What this is NOT

Verified during investigation:

  • DM guard works. bot/src/telegram-adapter.ts:60-71 correctly short-circuits sendDraft when binding.kind !== "dm". No sendMessageDraft ever reaches the wire from non-DM bindings; all 378 rate-limit lines today were attributable to a single DM binding.
  • Not an agent-side loop or tool storm. The largest tool-heavy turn observed (6× Bash plus Write/Read) produced zero rate-limit errors, because its output was tool-bound rather than streamed prose.
  • Not retry-on-error in the agent. No turn-level error/retry patterns appear in the affected session JSONLs.
  • Not delivery failure. Every affected turn delivered its final sendMessage successfully.

The problem is purely the cascading retry behavior on a cosmetic operation.

Proposals

Two independent options. Either alone would substantially reduce the cascade; together they would eliminate it.

Option A — raise draft debounce

bot/src/stream-relay.ts:124: increase DRAFT_DEBOUNCE_MS from 300 to 750-1000 so the steady-state draft rate stays under Telegram's per-chat threshold.

Trade-off: slightly choppier draft updates in the user's chat (drafts refresh ~1×/sec instead of ~3×/sec). The final sendMessage is unaffected.

Option B — disable autoRetry for draft methods

Drafts are cosmetic and fire-and-forget (see bot/src/stream-relay.ts:180 — failures are caught and discarded). A 429 on a draft has no recovery value: by the time autoRetry waits 3-10 s, the streaming text has already moved on. autoRetry on sendMessageDraft purely amplifies log noise.

Either lower AUTO_RETRY_OPTIONS.maxRetryAttempts globally (5 → 1), or selectively skip autoRetry for sendMessageDraft via a per-method filter.

Trade-off: lower global retry limit affects all methods — final sendMessage would also retry less aggressively (currently up to 5×). A per-method skip is more targeted.

Recommended order

  1. Wait for the in-flight observability PR (chat_id in error logs + new bot_telegram_api_calls_total counter labelled by binding) to land and gather one day of data.
  2. Compute the actual error-to-total ratio for sendMessageDraft from the new counter.
  3. Decide based on that ratio whether one option suffices or both are warranted.

Out of scope for this issue

  • The observability work (chat_id in logs + call counter) is being implemented separately and is not blocked by this issue.
  • The Prometheus alert expression at monitoring/prometheus/rules.yml:22-23 lives in a private monitoring repo and is not changed by this issue. It will likely be revised to use the new ratio once the counter is available.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions