Skip to content

Add Ollama Cloud support via managed sidecar container #57

@nezhar

Description

@nezhar

Problem

VibePod's LLM integration works by injecting ANTHROPIC_BASE_URL / CODEX_OSS_BASE_URL into agent containers, expecting an Anthropic- or OpenAI-compatible endpoint at that URL.

However, ollama.com's cloud API only exposes its native endpoint (/api/chat with Authorization: Bearer) — not the /v1/messages (Anthropic) or /v1/chat/completions (OpenAI) compatibility layers. Those are features of the local Ollama server process, not the hosted service.

This means users who want to use Ollama cloud models today must run a local Ollama instance, defeating the purpose of a cloud-backed workflow.

Proposed Solution: Ollama Sidecar Container

Add a managed ollama container to VibePod's stack (alongside proxy and datasette). The sidecar runs Ollama locally but uses :cloud-suffixed model names to offload inference to ollama.com. Agent containers hit the sidecar's compatibility endpoints as normal.

Config:

# ~/.config/vibepod/config.yaml
llm:
  enabled: true
  base_url: "http://vibepod-ollama:11434"
  api_key: "ollama"
  model: "gpt-oss:120b-cloud"

Usage:

# Sidecar starts automatically, OLLAMA_API_KEY is passed through
VP_LLM_ENABLED=true VP_LLM_MODEL=gpt-oss:120b-cloud vp run claude

This fits the existing image namespace pattern:

ollama -> ollama/ollama:latest

With an optional env override:

VP_IMAGE_OLLAMA=ollama/ollama:latest vp run claude

Why this approach

  • No changes needed to vibepod-proxy or agent containers
  • Both claude and codex agents work immediately (Anthropic + OpenAI compat layers are exposed on the sidecar)
  • Users get cloud inference without installing Ollama on the host
  • OLLAMA_API_KEY is the only credential needed
  • Lightweight — the sidecar itself does no local inference for :cloud models

Alternative Considered: Proxy Translation Layer

Extend vibepod-proxy with a mitmproxy addon that intercepts /v1/messages or /v1/chat/completions calls and rewrites them to Ollama's native https://ollama.com/api/chat format. More powerful (no sidecar needed, everything cloud-native), but significantly more complex to implement correctly, especially for streaming/SSE responses.

Could be pursued as a follow-up.

Implementation Notes

  • The sidecar needs OLLAMA_API_KEY injected so cloud model auth works
  • Use host.docker.internal pattern already established in the LLM docs for Docker networking
  • vp run ollama or auto-start as a dependency when llm.enabled: true and base_url points to the sidecar

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions