sendMessageDraft 429 cascade during long streaming replies in DMs

## Summary

During long streaming replies in DM bindings (`kind: "dm"`), the bot generates large bursts of `bot_telegram_api_errors_total{method="sendMessageDraft", error_code="429"}` and `bot_telegram_api_errors_total{method="sendChatAction", error_code="429"}` entries. The user-visible outcome is fine — the final `sendMessage` always delivers — but the log noise is heavy and the Prometheus `TelegramAPIErrors` alert fires repeatedly during normal usage.

Today (over one day of moderately heavy DM usage) the bot logged **428 rate-limit lines**, vs. 28-31 per day in the preceding baseline week. The single chat-volume spike drove a ~14× increase in error counter values, with no actual delivery failures observed.

## Evidence

Counter snapshot (anonymized):

```
bot_telegram_api_errors_total{method="sendMessageDraft",error_code="429"} 378
bot_telegram_api_errors_total{method="sendChatAction",error_code="429"} 50
```

Sample log lines from one burst:

```
2026-05-15T19:10:29.607Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
2026-05-15T19:10:31.706Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
2026-05-15T19:10:31.758Z WARN [telegram-api] Rate limited: method=sendMessageDraft retry_after=3
...
```

Hour-by-hour burst counts on a high-volume day:

| UTC hour | rate-limit lines |
|---|---|
| 09 | 17 |
| 12 | 67 |
| 15 | 19 |
| 16 | 71 |
| 17 | 145 |
| 18 | 43 |
| 19 | 66 |

16 burst clusters. Each cluster aligns 1:1 with the end of a single assistant turn whose final reply length exceeded ~2 KB. Linear correlation between final-reply size and burst size:

| Final reply (chars) | Avg rate-limit lines per turn |
|---|---|
| < 2,000 | 1–4 |
| 2,000–2,800 | 17–24 |
| 2,800–3,500 | 20–24 |
| 3,500–4,000 | 36–43 |
| > 4,000 | 66 (max sample) |

## Root cause

Three multiplying factors:

1. **`DRAFT_DEBOUNCE_MS = 300`** (`bot/src/stream-relay.ts:124`) — while a reply streams, draft updates fire roughly every 300 ms = up to ~3.3 calls/sec per chat.
2. **Per-chat Telegram limit on draft API** — empirically Telegram throttles `sendMessageDraft` to ~1 call/sec for the same `chat_id`, matching the documented general per-chat send-rate (Bot API FAQ "Broadcasting to users"). The official `sendMessageDraft` method spec does not document a draft-specific exemption.
3. **`AUTO_RETRY_OPTIONS.maxRetryAttempts = 5`** (`bot/src/telegram-bot.ts:497-501`) combined with `@grammyjs/auto-retry` honoring each `retry_after` (3-10 s) — every rate-limited logical `sendMessageDraft` call balloons into up to 5 additional retries, each logged.

For a 2.5 KB reply streaming over ~10-25 s the math is roughly 30-80 draft attempts × ~80% rejection rate × autoRetry 5× = the observed burst sizes.

## What this is NOT

Verified during investigation:

- **DM guard works.** `bot/src/telegram-adapter.ts:60-71` correctly short-circuits `sendDraft` when `binding.kind !== "dm"`. No `sendMessageDraft` ever reaches the wire from non-DM bindings; all 378 rate-limit lines today were attributable to a single DM binding.
- **Not an agent-side loop or tool storm.** The largest tool-heavy turn observed (6× `Bash` plus `Write`/`Read`) produced *zero* rate-limit errors, because its output was tool-bound rather than streamed prose.
- **Not retry-on-error in the agent.** No turn-level error/retry patterns appear in the affected session JSONLs.
- **Not delivery failure.** Every affected turn delivered its final `sendMessage` successfully.

The problem is purely the cascading retry behavior on a cosmetic operation.

## Proposals

Two independent options. Either alone would substantially reduce the cascade; together they would eliminate it.

### Option A — raise draft debounce

`bot/src/stream-relay.ts:124`: increase `DRAFT_DEBOUNCE_MS` from `300` to `750-1000` so the steady-state draft rate stays under Telegram's per-chat threshold.

**Trade-off:** slightly choppier draft updates in the user's chat (drafts refresh ~1×/sec instead of ~3×/sec). The final `sendMessage` is unaffected.

### Option B — disable autoRetry for draft methods

Drafts are cosmetic and fire-and-forget (see `bot/src/stream-relay.ts:180` — failures are caught and discarded). A 429 on a draft has no recovery value: by the time autoRetry waits 3-10 s, the streaming text has already moved on. autoRetry on `sendMessageDraft` purely amplifies log noise.

Either lower `AUTO_RETRY_OPTIONS.maxRetryAttempts` globally (5 → 1), or selectively skip autoRetry for `sendMessageDraft` via a per-method filter.

**Trade-off:** lower global retry limit affects all methods — final `sendMessage` would also retry less aggressively (currently up to 5×). A per-method skip is more targeted.

### Recommended order

1. Wait for the in-flight observability PR (chat_id in error logs + new `bot_telegram_api_calls_total` counter labelled by binding) to land and gather one day of data.
2. Compute the actual error-to-total ratio for `sendMessageDraft` from the new counter.
3. Decide based on that ratio whether one option suffices or both are warranted.

## Out of scope for this issue

- The observability work (chat_id in logs + call counter) is being implemented separately and is not blocked by this issue.
- The Prometheus alert expression at `monitoring/prometheus/rules.yml:22-23` lives in a private monitoring repo and is not changed by this issue. It will likely be revised to use the new ratio once the counter is available.

## References

- Migration that introduced this code path: #71
- Bot API method spec: https://core.telegram.org/bots/api#sendmessagedraft
- Bot API broadcast rate limits: https://core.telegram.org/bots/faq#my-bot-is-hitting-limits-how-do-i-avoid-this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sendMessageDraft 429 cascade during long streaming replies in DMs #117

Summary

Evidence

Root cause

What this is NOT

Proposals

Option A — raise draft debounce

Option B — disable autoRetry for draft methods

Recommended order

Out of scope for this issue

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Final reply (chars)	Avg rate-limit lines per turn
< 2,000	1–4
2,000–2,800	17–24
2,800–3,500	20–24
3,500–4,000	36–43
> 4,000	66 (max sample)

sendMessageDraft 429 cascade during long streaming replies in DMs #117

Description

Summary

Evidence

Root cause

What this is NOT

Proposals

Option A — raise draft debounce

Option B — disable autoRetry for draft methods

Recommended order

Out of scope for this issue

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions