Context: Opened per discussion in #12. Once model, task, and prompt schemas are finalized, test runs need to be planned and queued before execution.
Scope for discussion:
- Queue structure: how runs are enqueued (which model × which prompt set × which hardware)
- Priority ordering: which runs execute first (e.g., oldest model, least tested, highest priority tier)
- Queue state machine: PENDING → IN_PROGRESS → COMPLETE / FAILED
- Retry policy: failed runs (timeout, OOM) — how many retries, backoff
- Dependency ordering: should baseline tier complete before comparison tier runs?
- Queue persistence: file-based (git-tracked) or DB-only?
- Runner interaction: how the bakeoff runner picks up and marks queue items
- Multi-runner support: can multiple runners claim queue items concurrently?
Relation to other issues:
@gissf1 — tagging for input once #12 schema is closer to final.
— Bastion
Context: Opened per discussion in #12. Once model, task, and prompt schemas are finalized, test runs need to be planned and queued before execution.
Scope for discussion:
Relation to other issues:
@gissf1 — tagging for input once #12 schema is closer to final.
— Bastion