Add Ollama Cloud support via managed sidecar container

### Problem

VibePod's LLM integration works by injecting `ANTHROPIC_BASE_URL` / `CODEX_OSS_BASE_URL` into agent containers, expecting an Anthropic- or OpenAI-compatible endpoint at that URL.

However, `ollama.com`'s cloud API only exposes its **native** endpoint (`/api/chat` with `Authorization: Bearer`) — not the `/v1/messages` (Anthropic) or `/v1/chat/completions` (OpenAI) compatibility layers. Those are features of the local Ollama server process, not the hosted service.

This means users who want to use Ollama cloud models today must run a local Ollama instance, defeating the purpose of a cloud-backed workflow.

### Proposed Solution: Ollama Sidecar Container

Add a managed `ollama` container to VibePod's stack (alongside `proxy` and `datasette`). The sidecar runs Ollama locally but uses `:cloud`-suffixed model names to offload inference to `ollama.com`. Agent containers hit the sidecar's compatibility endpoints as normal.

**Config:**
```yaml
# ~/.config/vibepod/config.yaml
llm:
  enabled: true
  base_url: "http://vibepod-ollama:11434"
  api_key: "ollama"
  model: "gpt-oss:120b-cloud"
```

**Usage:**
```bash
# Sidecar starts automatically, OLLAMA_API_KEY is passed through
VP_LLM_ENABLED=true VP_LLM_MODEL=gpt-oss:120b-cloud vp run claude
```

This fits the existing image namespace pattern:
```
ollama -> ollama/ollama:latest
```

With an optional env override:
```bash
VP_IMAGE_OLLAMA=ollama/ollama:latest vp run claude
```

### Why this approach

- No changes needed to `vibepod-proxy` or agent containers
- Both `claude` and `codex` agents work immediately (Anthropic + OpenAI compat layers are exposed on the sidecar)
- Users get cloud inference without installing Ollama on the host
- `OLLAMA_API_KEY` is the only credential needed
- Lightweight — the sidecar itself does no local inference for `:cloud` models

### Alternative Considered: Proxy Translation Layer

Extend `vibepod-proxy` with a mitmproxy addon that intercepts `/v1/messages` or `/v1/chat/completions` calls and rewrites them to Ollama's native `https://ollama.com/api/chat` format. More powerful (no sidecar needed, everything cloud-native), but significantly more complex to implement correctly, especially for streaming/SSE responses.

Could be pursued as a follow-up.

### Implementation Notes

- The sidecar needs `OLLAMA_API_KEY` injected so cloud model auth works
- Use `host.docker.internal` pattern already established in the LLM docs for Docker networking
- `vp run ollama` or auto-start as a dependency when `llm.enabled: true` and base_url points to the sidecar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ollama Cloud support via managed sidecar container #57

Problem

Proposed Solution: Ollama Sidecar Container

Why this approach

Alternative Considered: Proxy Translation Layer

Implementation Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add Ollama Cloud support via managed sidecar container #57

Description

Problem

Proposed Solution: Ollama Sidecar Container

Why this approach

Alternative Considered: Proxy Translation Layer

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions