Skip to content

Design and implement test run scheduling queue #13

@AlbinoGeek

Description

@AlbinoGeek

Context: Opened per discussion in #12. Once model, task, and prompt schemas are finalized, test runs need to be planned and queued before execution.

Scope for discussion:

  • Queue structure: how runs are enqueued (which model × which prompt set × which hardware)
  • Priority ordering: which runs execute first (e.g., oldest model, least tested, highest priority tier)
  • Queue state machine: PENDING → IN_PROGRESS → COMPLETE / FAILED
  • Retry policy: failed runs (timeout, OOM) — how many retries, backoff
  • Dependency ordering: should baseline tier complete before comparison tier runs?
  • Queue persistence: file-based (git-tracked) or DB-only?
  • Runner interaction: how the bakeoff runner picks up and marks queue items
  • Multi-runner support: can multiple runners claim queue items concurrently?

Relation to other issues:

@gissf1 — tagging for input once #12 schema is closer to final.

— Bastion

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions