Summary
During long streaming replies in DM bindings (kind: "dm"), the bot generates large bursts of bot_telegram_api_errors_total{method="sendMessageDraft", error_code="429"} and bot_telegram_api_errors_total{method="sendChatAction", error_code="429"} entries. The user-visible outcome is fine — the final sendMessage always delivers — but the log noise is heavy and the Prometheus TelegramAPIErrors alert fires repeatedly during normal usage.
Today (over one day of moderately heavy DM usage) the bot logged 428 rate-limit lines, vs. 28-31 per day in the preceding baseline week. The single chat-volume spike drove a ~14× increase in error counter values, with no actual delivery failures observed.
Evidence
Counter snapshot (anonymized):
bot_telegram_api_errors_total{method="sendMessageDraft",error_code="429"} 378
bot_telegram_api_errors_total{method="sendChatAction",error_code="429"} 50
Sample log lines from one burst:
2026-05-15T19:10:29.607Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
2026-05-15T19:10:31.706Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
2026-05-15T19:10:31.758Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
...
Hour-by-hour burst counts on a high-volume day:
| UTC hour |
rate-limit lines |
| 09 |
17 |
| 12 |
67 |
| 15 |
19 |
| 16 |
71 |
| 17 |
145 |
| 18 |
43 |
| 19 |
66 |
16 burst clusters. Each cluster aligns 1:1 with the end of a single assistant turn whose final reply length exceeded ~2 KB. Linear correlation between final-reply size and burst size:
| Final reply (chars) |
Avg rate-limit lines per turn |
| < 2,000 |
1–4 |
| 2,000–2,800 |
17–24 |
| 2,800–3,500 |
20–24 |
| 3,500–4,000 |
36–43 |
| > 4,000 |
66 (max sample) |
Root cause
Three multiplying factors:
DRAFT_DEBOUNCE_MS = 300 (bot/src/stream-relay.ts:124) — while a reply streams, draft updates fire roughly every 300 ms = up to ~3.3 calls/sec per chat.
- Per-chat Telegram limit on draft API — empirically Telegram throttles
sendMessageDraft to ~1 call/sec for the same chat_id, matching the documented general per-chat send-rate (Bot API FAQ "Broadcasting to users"). The official sendMessageDraft method spec does not document a draft-specific exemption.
AUTO_RETRY_OPTIONS.maxRetryAttempts = 5 (bot/src/telegram-bot.ts:497-501) combined with @grammyjs/auto-retry honoring each retry_after (3-10 s) — every rate-limited logical sendMessageDraft call balloons into up to 5 additional retries, each logged.
For a 2.5 KB reply streaming over ~10-25 s the math is roughly 30-80 draft attempts × ~80% rejection rate × autoRetry 5× = the observed burst sizes.
What this is NOT
Verified during investigation:
- DM guard works.
bot/src/telegram-adapter.ts:60-71 correctly short-circuits sendDraft when binding.kind !== "dm". No sendMessageDraft ever reaches the wire from non-DM bindings; all 378 rate-limit lines today were attributable to a single DM binding.
- Not an agent-side loop or tool storm. The largest tool-heavy turn observed (6×
Bash plus Write/Read) produced zero rate-limit errors, because its output was tool-bound rather than streamed prose.
- Not retry-on-error in the agent. No turn-level error/retry patterns appear in the affected session JSONLs.
- Not delivery failure. Every affected turn delivered its final
sendMessage successfully.
The problem is purely the cascading retry behavior on a cosmetic operation.
Proposals
Two independent options. Either alone would substantially reduce the cascade; together they would eliminate it.
Option A — raise draft debounce
bot/src/stream-relay.ts:124: increase DRAFT_DEBOUNCE_MS from 300 to 750-1000 so the steady-state draft rate stays under Telegram's per-chat threshold.
Trade-off: slightly choppier draft updates in the user's chat (drafts refresh ~1×/sec instead of ~3×/sec). The final sendMessage is unaffected.
Option B — disable autoRetry for draft methods
Drafts are cosmetic and fire-and-forget (see bot/src/stream-relay.ts:180 — failures are caught and discarded). A 429 on a draft has no recovery value: by the time autoRetry waits 3-10 s, the streaming text has already moved on. autoRetry on sendMessageDraft purely amplifies log noise.
Either lower AUTO_RETRY_OPTIONS.maxRetryAttempts globally (5 → 1), or selectively skip autoRetry for sendMessageDraft via a per-method filter.
Trade-off: lower global retry limit affects all methods — final sendMessage would also retry less aggressively (currently up to 5×). A per-method skip is more targeted.
Recommended order
- Wait for the in-flight observability PR (chat_id in error logs + new
bot_telegram_api_calls_total counter labelled by binding) to land and gather one day of data.
- Compute the actual error-to-total ratio for
sendMessageDraft from the new counter.
- Decide based on that ratio whether one option suffices or both are warranted.
Out of scope for this issue
- The observability work (chat_id in logs + call counter) is being implemented separately and is not blocked by this issue.
- The Prometheus alert expression at
monitoring/prometheus/rules.yml:22-23 lives in a private monitoring repo and is not changed by this issue. It will likely be revised to use the new ratio once the counter is available.
References
Summary
During long streaming replies in DM bindings (
kind: "dm"), the bot generates large bursts ofbot_telegram_api_errors_total{method="sendMessageDraft", error_code="429"}andbot_telegram_api_errors_total{method="sendChatAction", error_code="429"}entries. The user-visible outcome is fine — the finalsendMessagealways delivers — but the log noise is heavy and the PrometheusTelegramAPIErrorsalert fires repeatedly during normal usage.Today (over one day of moderately heavy DM usage) the bot logged 428 rate-limit lines, vs. 28-31 per day in the preceding baseline week. The single chat-volume spike drove a ~14× increase in error counter values, with no actual delivery failures observed.
Evidence
Counter snapshot (anonymized):
Sample log lines from one burst:
Hour-by-hour burst counts on a high-volume day:
16 burst clusters. Each cluster aligns 1:1 with the end of a single assistant turn whose final reply length exceeded ~2 KB. Linear correlation between final-reply size and burst size:
Root cause
Three multiplying factors:
DRAFT_DEBOUNCE_MS = 300(bot/src/stream-relay.ts:124) — while a reply streams, draft updates fire roughly every 300 ms = up to ~3.3 calls/sec per chat.sendMessageDraftto ~1 call/sec for the samechat_id, matching the documented general per-chat send-rate (Bot API FAQ "Broadcasting to users"). The officialsendMessageDraftmethod spec does not document a draft-specific exemption.AUTO_RETRY_OPTIONS.maxRetryAttempts = 5(bot/src/telegram-bot.ts:497-501) combined with@grammyjs/auto-retryhonoring eachretry_after(3-10 s) — every rate-limited logicalsendMessageDraftcall balloons into up to 5 additional retries, each logged.For a 2.5 KB reply streaming over ~10-25 s the math is roughly 30-80 draft attempts × ~80% rejection rate × autoRetry 5× = the observed burst sizes.
What this is NOT
Verified during investigation:
bot/src/telegram-adapter.ts:60-71correctly short-circuitssendDraftwhenbinding.kind !== "dm". NosendMessageDraftever reaches the wire from non-DM bindings; all 378 rate-limit lines today were attributable to a single DM binding.BashplusWrite/Read) produced zero rate-limit errors, because its output was tool-bound rather than streamed prose.sendMessagesuccessfully.The problem is purely the cascading retry behavior on a cosmetic operation.
Proposals
Two independent options. Either alone would substantially reduce the cascade; together they would eliminate it.
Option A — raise draft debounce
bot/src/stream-relay.ts:124: increaseDRAFT_DEBOUNCE_MSfrom300to750-1000so the steady-state draft rate stays under Telegram's per-chat threshold.Trade-off: slightly choppier draft updates in the user's chat (drafts refresh ~1×/sec instead of ~3×/sec). The final
sendMessageis unaffected.Option B — disable autoRetry for draft methods
Drafts are cosmetic and fire-and-forget (see
bot/src/stream-relay.ts:180— failures are caught and discarded). A 429 on a draft has no recovery value: by the time autoRetry waits 3-10 s, the streaming text has already moved on. autoRetry onsendMessageDraftpurely amplifies log noise.Either lower
AUTO_RETRY_OPTIONS.maxRetryAttemptsglobally (5 → 1), or selectively skip autoRetry forsendMessageDraftvia a per-method filter.Trade-off: lower global retry limit affects all methods — final
sendMessagewould also retry less aggressively (currently up to 5×). A per-method skip is more targeted.Recommended order
bot_telegram_api_calls_totalcounter labelled by binding) to land and gather one day of data.sendMessageDraftfrom the new counter.Out of scope for this issue
monitoring/prometheus/rules.yml:22-23lives in a private monitoring repo and is not changed by this issue. It will likely be revised to use the new ratio once the counter is available.References