π€ Kelos Strategist Agent @gjkim42
Area: New CRDs & API Extensions
Summary
Kelos's agent type system is hardcoded across 4+ source files β every new agent requires changes to the CRD enum, job builder, credential mapper, and output parser. This has already happened 5 times (claude-code β codex β gemini β opencode β cursor), each following the same mechanical pattern. Meanwhile, the AI coding agent landscape is rapidly expanding (Aider, SWE-agent, Goose, Amazon Q Developer, Continue, Windsurf, and many internal/proprietary agents). This proposal introduces an AgentType CRD that lets users register custom agent types declaratively, making Kelos truly agent-agnostic without requiring upstream releases for every new agent.
Problem
1. Adding a new agent type requires changes across 4+ files
Each new agent type touches the same set of files:
CRD enum validation (api/v1alpha1/task_types.go:89):
// +kubebuilder:validation:Enum=claude-code;codex;gemini;opencode;cursor
Type string `json:"type"`
Job builder switch (internal/controller/job_builder.go:111-125):
func (b *JobBuilder) Build(...) (*batchv1.Job, error) {
switch task.Spec.Type {
case AgentTypeClaudeCode:
return b.buildAgentJob(task, workspace, agentConfig, b.ClaudeCodeImage, ...)
case AgentTypeCodex:
return b.buildAgentJob(task, workspace, agentConfig, b.CodexImage, ...)
// ... one case per agent type
default:
return nil, fmt.Errorf("unsupported agent type: %s", task.Spec.Type)
}
}
Credential env var mapping (internal/controller/job_builder.go:129-169):
func apiKeyEnvVar(agentType string) string {
switch agentType {
case AgentTypeCodex:
return "CODEX_API_KEY"
case AgentTypeGemini:
return "GEMINI_API_KEY"
// ... one case per agent type
default:
return "ANTHROPIC_API_KEY"
}
}
Output usage parser (internal/capture/usage.go:33-46):
func ParseUsage(agentType, filePath string) map[string]string {
switch agentType {
case "claude-code":
return parseClaudeCode(lines)
case "codex":
return parseCodex(lines)
// ... one case per agent type
default:
return nil // Unknown types get NO token/cost tracking
}
}
Plus: image constants, image flags in the controller binary, JobBuilder struct fields, and a new Dockerfile + entrypoint script per agent.
2. The existing escape hatch has real limitations
Kelos already supports custom images via spec.image override and credentials.type: none for BYO credentials. But this workaround has three concrete problems:
a) Must declare one of 5 types even for custom agents:
Users running Aider or an internal agent must pick claude-code or another built-in type, which is semantically wrong:
spec:
type: claude-code # Actually running Aider β misleading
image: ghcr.io/myorg/kelos-aider:latest
credentials:
type: none
b) KELOS_AGENT_TYPE is set to the wrong value:
The job builder injects KELOS_AGENT_TYPE=claude-code into the container environment. This breaks kelos-capture β it tries to parse the output as Claude Code's JSON format, which won't match Aider's output. Result: no token usage or cost tracking for custom agents, even if the agent provides this data.
c) No way to register a default image globally:
With built-in types, the controller provides a default image (set via flags). Custom agents must set spec.image on every Task or TaskTemplate β there's no way to say "when type is aider, always use ghcr.io/myorg/kelos-aider:latest."
3. The growth trajectory demands extensibility
The AI coding agent space is expanding rapidly. Agents that exist today or are emerging:
- Aider β Open-source, supports any LLM backend
- SWE-agent β Research-grade from Princeton
- Amazon Q Developer β AWS-native
- Goose β Block's open-source agent
- Continue β Open-source IDE agent with CLI mode
- Windsurf β Codeium's agent
- Bolt β StackBlitz's agent
- Internal/proprietary agents β Many enterprises build their own
Requiring a Kelos upstream release for each new agent creates a bottleneck. The agent image interface (docs/agent-image-interface.md) is already well-defined β any image implementing it can work with Kelos. The type system is the only thing preventing truly pluggable agents.
Proposed Solution: AgentType CRD
New CRD: AgentType
apiVersion: kelos.dev/v1alpha1
kind: AgentType
metadata:
name: aider
spec:
# Default container image for this agent type.
# Can still be overridden per-Task via spec.image.
image: ghcr.io/myorg/kelos-aider:v0.82.0
# Credential environment variable mappings.
# Maps credential type β env var name used to inject the secret value.
credentialEnvVars:
api-key: OPENAI_API_KEY
oauth: OPENAI_AUTH_TOKEN
# Output format for kelos-capture token/cost extraction.
# "generic" uses a configurable JSON path-based parser.
# Omit for agents that emit KELOS_OUTPUTS markers directly.
outputFormat:
type: jsonl # jsonl (one JSON object per line) or none
eventType: "result" # JSON objects with this "type" field value
tokenPaths:
inputTokens: "usage.input_tokens" # JSONPath within the event
outputTokens: "usage.output_tokens"
costUSD: "total_cost_usd" # Optional
Usage in Task/TaskSpawner
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
name: fix-bug-with-aider
spec:
type: aider # Resolved against AgentType CRD
prompt: "Fix the failing test in pkg/auth/handler_test.go"
credentials:
type: api-key
secretRef:
name: openai-key # Secret key name: OPENAI_API_KEY (from AgentType)
workspaceRef:
name: my-workspace
Complete example: Internal agent with custom output format
# Register the custom agent type
apiVersion: kelos.dev/v1alpha1
kind: AgentType
metadata:
name: internal-coder
spec:
image: registry.internal.co/ai/coder-agent:2.0.0
credentialEnvVars:
api-key: INTERNAL_API_KEY
outputFormat:
type: jsonl
eventType: "completion"
tokenPaths:
inputTokens: "metrics.prompt_tokens"
outputTokens: "metrics.completion_tokens"
costUSD: "metrics.cost"
---
# TaskSpawner using the custom type
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
name: internal-bug-fixer
spec:
when:
githubIssues:
labels: [bug, ai-eligible]
taskTemplate:
type: internal-coder # References AgentType CRD
credentials:
type: api-key
secretRef:
name: internal-api-key
workspaceRef:
name: my-workspace
branch: "kelos-{{.Number}}"
promptTemplate: |
Fix issue #{{.Number}}: {{.Title}}
{{.Body}}
ttlSecondsAfterFinished: 3600
Implementation Path
Phase 1: Relax type validation (minimal, backward-compatible)
- Remove the kubebuilder enum from
TaskSpec.Type and TaskTemplate.Type. Replace with a webhook validation that accepts built-in types + names matching existing AgentType resources.
- Fall through gracefully in
job_builder.go:Build(): if type is not built-in, require spec.image to be set (return a clear error if missing), and set KELOS_AGENT_TYPE to the actual type string.
- Fall through gracefully in
capture/usage.go:ParseUsage(): for unknown types, attempt a generic JSONL parser or return nil (same as today, but with the correct type name logged).
This alone unblocks custom agents with correct semantics: users declare their actual type name, set their image, and use credentials.type: none for credentials.
Phase 2: AgentType CRD (full solution)
- Add the AgentType CRD with
image, credentialEnvVars, and outputFormat fields.
- Task controller resolves AgentType before building the Job: fetch the AgentType resource, use its image as default, map credential types to env var names, and configure the capture parser.
- Webhook validation checks that
spec.type is either a built-in type or matches an existing AgentType resource name.
- Add AgentType support to
kelos create agenttype CLI command.
Phase 3: Output format extensibility
- Generic JSONL parser in
kelos-capture: configurable via KELOS_OUTPUT_FORMAT env var (set from AgentType.spec.outputFormat).
- Built-in parsers remain for the 5 first-class types (no regression).
- Custom parsers can be added by mounting a parser script in the agent image.
Backward Compatibility
- Existing tasks: All 5 built-in types continue to work exactly as today. They use hardcoded images, credential mappings, and output parsers. No migration needed.
- Existing TaskSpawners: No changes required. The
type field continues to accept all current values.
- CRD upgrade: Phase 1 (removing the enum) is a CRD schema relaxation, not a breaking change. Kubernetes allows widening validation.
- AgentType is additive: It's a new CRD that doesn't modify existing resources.
Why this matters for adoption
- Enterprise teams with internal agents can adopt Kelos without forking it
- Open-source agent builders can provide Kelos-compatible images + AgentType manifests
- Reduces maintenance burden β the Kelos team doesn't need to add and maintain agent-specific code for every new agent
- Aligns with the agent image interface β the interface is already well-defined and agent-agnostic; the type system should match
Related
docs/agent-image-interface.md β Already defines the contract custom images must implement
internal/capture/usage.go β Per-agent output parsers that would benefit from extensibility
internal/controller/job_builder.go:173 comment β "new providers (e.g. Vertex) only need to add a case here" confirms the team expects more agent types
π€ Kelos Strategist Agent @gjkim42
Area: New CRDs & API Extensions
Summary
Kelos's agent type system is hardcoded across 4+ source files β every new agent requires changes to the CRD enum, job builder, credential mapper, and output parser. This has already happened 5 times (claude-code β codex β gemini β opencode β cursor), each following the same mechanical pattern. Meanwhile, the AI coding agent landscape is rapidly expanding (Aider, SWE-agent, Goose, Amazon Q Developer, Continue, Windsurf, and many internal/proprietary agents). This proposal introduces an
AgentTypeCRD that lets users register custom agent types declaratively, making Kelos truly agent-agnostic without requiring upstream releases for every new agent.Problem
1. Adding a new agent type requires changes across 4+ files
Each new agent type touches the same set of files:
CRD enum validation (
api/v1alpha1/task_types.go:89):Job builder switch (
internal/controller/job_builder.go:111-125):Credential env var mapping (
internal/controller/job_builder.go:129-169):Output usage parser (
internal/capture/usage.go:33-46):Plus: image constants, image flags in the controller binary, JobBuilder struct fields, and a new Dockerfile + entrypoint script per agent.
2. The existing escape hatch has real limitations
Kelos already supports custom images via
spec.imageoverride andcredentials.type: nonefor BYO credentials. But this workaround has three concrete problems:a) Must declare one of 5 types even for custom agents:
Users running Aider or an internal agent must pick
claude-codeor another built-in type, which is semantically wrong:b)
KELOS_AGENT_TYPEis set to the wrong value:The job builder injects
KELOS_AGENT_TYPE=claude-codeinto the container environment. This breakskelos-captureβ it tries to parse the output as Claude Code's JSON format, which won't match Aider's output. Result: no token usage or cost tracking for custom agents, even if the agent provides this data.c) No way to register a default image globally:
With built-in types, the controller provides a default image (set via flags). Custom agents must set
spec.imageon every Task or TaskTemplate β there's no way to say "when type isaider, always useghcr.io/myorg/kelos-aider:latest."3. The growth trajectory demands extensibility
The AI coding agent space is expanding rapidly. Agents that exist today or are emerging:
Requiring a Kelos upstream release for each new agent creates a bottleneck. The agent image interface (
docs/agent-image-interface.md) is already well-defined β any image implementing it can work with Kelos. The type system is the only thing preventing truly pluggable agents.Proposed Solution: AgentType CRD
New CRD: AgentType
Usage in Task/TaskSpawner
Complete example: Internal agent with custom output format
Implementation Path
Phase 1: Relax type validation (minimal, backward-compatible)
TaskSpec.TypeandTaskTemplate.Type. Replace with a webhook validation that accepts built-in types + names matching existingAgentTyperesources.job_builder.go:Build(): if type is not built-in, requirespec.imageto be set (return a clear error if missing), and setKELOS_AGENT_TYPEto the actual type string.capture/usage.go:ParseUsage(): for unknown types, attempt a generic JSONL parser or return nil (same as today, but with the correct type name logged).This alone unblocks custom agents with correct semantics: users declare their actual type name, set their image, and use
credentials.type: nonefor credentials.Phase 2: AgentType CRD (full solution)
image,credentialEnvVars, andoutputFormatfields.spec.typeis either a built-in type or matches an existingAgentTyperesource name.kelos create agenttypeCLI command.Phase 3: Output format extensibility
kelos-capture: configurable viaKELOS_OUTPUT_FORMATenv var (set from AgentType.spec.outputFormat).Backward Compatibility
typefield continues to accept all current values.Why this matters for adoption
Related
docs/agent-image-interface.mdβ Already defines the contract custom images must implementinternal/capture/usage.goβ Per-agent output parsers that would benefit from extensibilityinternal/controller/job_builder.go:173comment β"new providers (e.g. Vertex) only need to add a case here"confirms the team expects more agent types