Getting Started

How configuration is loaded

Configuration loads in a strict three-layer pipeline. Each layer can only add to or override what came before it; nothing is re-read after the pipeline completes.

1. Code defaults  (built-in safe values)
        ↓ overlaid by
2. config/config.yaml  (optional, supports ${VAR:-default} expansion)
        ↓ overlaid by
3. Environment variables  (.env file loaded first, then the real process environment)

Provider credentials follow the same pipeline but go through an additional resolution step:

YAML providers: section
        ↓ env vars applied  (OPENAI_API_KEY, ANTHROPIC_BASE_URL, etc.)
        ↓ filtered          (drop entries with missing or unresolved credentials)
        ↓ resilience merged (per-provider overrides applied on top of global defaults)
→ fully resolved provider, ready to handle requests

Where to put the config file

The loader checks these two paths in order; the first file found wins:

config/config.yaml (recommended - keeps config separate from the working directory)
config.yaml (working directory root)

If neither file exists, that is not an error - only code defaults and env vars apply.

Scenarios

Scenario 1 - env-only, no YAML (simplest deployment)

No config/config.yaml exists. All provider credentials come from environment variables. Everything else runs on code defaults.

.env

GOMODEL_MASTER_KEY=super-secret

OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...
# Ollama uses http://localhost:11434/v1 by default - only set this if yours is elsewhere
# OLLAMA_BASE_URL=http://custom-ollama:11434/v1

What happens step by step:

No YAML file is found - provider list starts empty.
OPENAI_API_KEY and ANTHROPIC_API_KEY are discovered and turned into provider entries.
Both pass credential filtering; Ollama is also included unconditionally (it needs no API key).
All providers receive the built-in default resilience settings.

Effective resilience for every provider:

retry:
  max_retries:     3
  initial_backoff: 1s
  max_backoff:     30s
  backoff_factor:  2.0
  jitter_factor:   0.1
circuit_breaker:
  failure_threshold: 5   # open after 5 consecutive failures
  success_threshold: 2   # close again after 2 consecutive successes
  timeout:           30s # how long to stay open before probing

Scenario 2 - YAML with providers block and per-provider resilience tuning

You want tighter retry limits globally and a more aggressive circuit breaker for a specific noisy provider.

config/config.yaml

resilience:
  retry:
    max_retries: 2
    initial_backoff: 500ms
    max_backoff: 10s
    backoff_factor: 1.5
    jitter_factor: 0.05
  circuit_breaker:
    failure_threshold: 3
    success_threshold: 1
    timeout: 15s

providers:
  openai:
    type: openai
    api_key: ${OPENAI_API_KEY}
  anthropic:
    type: anthropic
    api_key: ${ANTHROPIC_API_KEY}
    resilience:
      retry:
        max_retries: 5 # Anthropic supports long requests - allow more retries
  ollama:
    type: ollama
    base_url: ${OLLAMA_BASE_URL:-http://localhost:11434/v1}
    resilience:
      circuit_breaker:
        failure_threshold: 10 # local service - tolerate more transient failures
        timeout: 5s

Effective resilience per provider:

Provider	max_retries	failure_threshold	cb timeout
openai	2 (global)	3 (global)	15s (global)
anthropic	5 (override)	3 (global)	15s (global)
ollama	2 (global)	10 (override)	5s (override)

Only fields that are explicitly listed under a provider's resilience: block are overridden. Everything else silently inherits from the global section.

Scenario 3 - YAML for resilience only, providers from env

You want resilience settings in version-controlled config but do not want credentials committed to the repository.

config/config.yaml

resilience:
  retry:
    max_retries: 4
    initial_backoff: 2s
  circuit_breaker:
    failure_threshold: 5
    success_threshold: 2
    timeout: 60s

.env (not committed)

OPENAI_API_KEY=sk-proj-...
GROQ_API_KEY=gsk_...

What happens:

YAML sets max_retries: 4 and initial_backoff: 2s. The other retry fields (max_backoff, backoff_factor, jitter_factor) are not listed, so they keep the code defaults.
The providers: key is absent from the YAML - provider list starts empty.
OPENAI_API_KEY and GROQ_API_KEY are discovered from the environment and turned into provider entries.
Both providers inherit the YAML-sourced global resilience config; there are no per-provider overrides.

Environment variable reference

All resilience settings can be overridden at runtime via env vars. Env vars always beat both code defaults and YAML values.

Variable	Type	Default	Description
`RETRY_MAX_RETRIES`	int	`3`	Maximum retry attempts per request
`RETRY_INITIAL_BACKOFF`	duration	`1s`	First retry wait (e.g. `500ms`, `2s`)
`RETRY_MAX_BACKOFF`	duration	`30s`	Upper cap on retry wait
`RETRY_BACKOFF_FACTOR`	float	`2.0`	Exponential multiplier between retries
`RETRY_JITTER_FACTOR`	float	`0.1`	Random jitter as a fraction of the backoff
`CIRCUIT_BREAKER_FAILURE_THRESHOLD`	int	`5`	Consecutive failures before opening
`CIRCUIT_BREAKER_SUCCESS_THRESHOLD`	int	`2`	Consecutive successes to close again
`CIRCUIT_BREAKER_TIMEOUT`	duration	`30s`	How long the circuit stays open
`LOG_FORMAT`	string	(unset)	Auto-detects based on environment: colorized text on a TTY, JSON otherwise. Set to `text` to force human-readable output (no colors if not a TTY), or `json` to force structured JSON even on a TTY (recommended for production, CloudWatch, Datadog, GCP).
`LOG_LEVEL`	string	`info`	Minimum runtime log level. Supported values are `debug`, `info`, `warn`, and `error`. Common aliases such as `dbg`, `inf`, `warning`, and `err` are also accepted.

Provider credentials:

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`OPENAI_BASE_URL`	OpenAI (custom endpoint)
`ANTHROPIC_API_KEY`	Anthropic
`ANTHROPIC_BASE_URL`	Anthropic (custom endpoint)
`GEMINI_API_KEY`	Google Gemini
`GEMINI_BASE_URL`	Gemini (custom endpoint)
`OPENROUTER_API_KEY`	OpenRouter (default base URL: `https://openrouter.ai/api/v1`)
`OPENROUTER_BASE_URL`	OpenRouter (custom endpoint override)
`OPENROUTER_SITE_URL`	OpenRouter attribution URL override (default: `https://gomodel.enterpilot.io`)
`OPENROUTER_APP_NAME`	OpenRouter attribution title override (default: `GoModel`)
`ZAI_API_KEY`	Z.ai (default base URL: `https://api.z.ai/api/paas/v4`)
`ZAI_BASE_URL`	Z.ai (custom endpoint, including GLM Coding Plan)
`XAI_API_KEY`	xAI / Grok
`XAI_BASE_URL`	xAI (custom endpoint)
`GROQ_API_KEY`	Groq
`GROQ_BASE_URL`	Groq (custom endpoint)
`AZURE_API_KEY`	Azure OpenAI
`AZURE_BASE_URL`	Azure OpenAI deployment base URL
`AZURE_API_VERSION`	Azure OpenAI API version override (default: `2024-10-21`)
`ORACLE_API_KEY`	Oracle
`ORACLE_BASE_URL`	Oracle OpenAI-compatible base URL
`ORACLE_MODELS`	Oracle fallback model inventory (comma-separated, used when `/models` is unavailable)
`OLLAMA_BASE_URL`	Ollama (default: `http://localhost:11434/v1`)
`VLLM_BASE_URL`	vLLM OpenAI-compatible server (default: `http://localhost:8000/v1`)
`VLLM_API_KEY`	vLLM bearer token, only when upstream vLLM was started with `--api-key`

See .env.template for the full list of all configurable environment variables.

Common gotchas

Unresolved ${VAR} placeholders drop the provider. If ${OPENAI_API_KEY} is in your YAML but the env var is not actually set, the literal string ${OPENAI_API_KEY} ends up as the API key value. The credential filter detects the ${ prefix and removes the provider. Always verify your env vars are exported before starting the process.

Per-provider resilience can only come from YAML, not from env vars. The env var override walk skips maps. RETRY_MAX_RETRIES changes the global default for all providers but cannot target a single provider. Per-provider tuning requires a providers.<name>.resilience: block in config.yaml.

Env vars override YAML globals. Setting CIRCUIT_BREAKER_TIMEOUT=60s in the environment overrides whatever timeout is written in config.yaml, regardless of order or file contents.

Ollama is always active. Ollama requires no API key. Even with no YAML and no OLLAMA_BASE_URL set, an Ollama provider is registered pointing at http://localhost:11434/v1. If it is unreachable at startup, the gateway still starts and keeps retrying model discovery on the configured refresh interval.

Azure requires both key and base URL. AZURE_API_KEY alone is not enough for auto-discovery. Set AZURE_BASE_URL to the Azure deployment endpoint as well, otherwise the provider is ignored.

Oracle requires both key and base URL. ORACLE_API_KEY alone is not enough for auto-discovery. Set ORACLE_BASE_URL to the Oracle OpenAI-compatible endpoint, otherwise the provider is ignored. If your Oracle endpoint does not return a usable model list, set ORACLE_MODELS or configure providers.<name>.models in YAML to seed the router with explicit model IDs.

Azure ships with a pinned API version by default. If you do not set AZURE_API_VERSION, the gateway sends api-version=2024-10-21. Override it only when you need a different Azure API version.

OpenRouter gets GoModel attribution headers by default. When the openrouter provider is used, the gateway adds HTTP-Referer and X-OpenRouter-Title unless the request already provides them. Override the defaults with OPENROUTER_SITE_URL and OPENROUTER_APP_NAME.

Z.ai Coding Plan uses a different endpoint. ZAI_API_KEY uses https://api.z.ai/api/paas/v4 by default. For GLM Coding Plan, set ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4.

vLLM uses its /models endpoint. Set VLLM_BASE_URL to register a keyless vLLM server. Set VLLM_API_KEY only when upstream vLLM requires bearer authentication. GoModel uses the model IDs returned by vLLM, including slash-shaped Hugging Face IDs such as meta-llama/Llama-3.1-8B-Instruct.

Partial YAML fields leave the rest at defaults. YAML is unmarshalled onto the struct that was already populated by built-in defaults. Only fields that appear in the file are written. Omitting max_backoff from resilience.retry leaves it at 30s; you do not need to repeat defaults you are happy with.

Two YAML search paths, first wins. config/config.yaml is checked before config.yaml in the working directory. They are not merged - whichever is found first is the only one used.

API Examples

OpenAI Examples

Basic Chat Completion

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Chat Completion with Parameters

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a haiku about programming."}
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

Chat Completion with Function Calling

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the weather in Warsaw?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "lookup_weather",
          "description": "Get the weather for a city.",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      }
    ],
    "tool_choice": {
      "type": "function",
      "function": {"name": "lookup_weather"}
    }
  }'

Streaming Response

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Tell me a short story."}
    ],
    "stream": true
  }'

Anthropic Examples

Basic Chat Completion

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Chat Completion with System Message

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {"role": "system", "content": "You are a creative writing assistant."},
      {"role": "user", "content": "Write a haiku about the ocean."}
    ],
    "temperature": 0.8,
    "max_tokens": 200
  }'

Streaming Response

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "stream": true
  }'

Using Claude Opus (Most Capable Model)

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-3-opus-20240229",
    "messages": [
      {"role": "user", "content": "Analyze the pros and cons of renewable energy."}
    ],
    "max_tokens": 1000
  }'

Google Gemini Examples

Basic Chat Completion

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash-preview",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Chat Completion with Parameters

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-1.5-pro",
    "messages": [
      {"role": "system", "content": "You are a knowledgeable science educator."},
      {"role": "user", "content": "Explain photosynthesis in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Streaming Response

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "gemini-2.0-flash",
    "messages": [
      {"role": "user", "content": "Write a short poem about AI."}
    ],
    "stream": true
  }'

xAI Examples

Basic Responses API Request

curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4-1-fast-non-reasoning",
    "input": "What is the capital of France?"
  }'

Responses API Request with Instructions

curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "grok-4-1-fast-non-reasoning",
    "input": "Write a haiku about programming.",
    "instructions": "You are a creative AI assistant who specializes in writing poetry.",
    "temperature": 0.8,
    "max_output_tokens": 200
  }'

Streaming Responses

curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "grok-4-1-fast-non-reasoning",
    "input": "Tell me a short story about AI.",
    "stream": true
  }'

Embeddings

Basic Embedding

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Batch Embedding (multiple inputs)

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": ["First sentence", "Second sentence", "Third sentence"]
  }'

With Custom Dimensions

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-large",
    "input": "Hello world",
    "dimensions": 512
  }'

Supported by: OpenAI, Gemini, Groq, Z.ai, xAI, Ollama. Anthropic does not support embeddings natively.

List Available Models

curl http://localhost:8080/v1/models

Example response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai",
      "created": 1234567890
    },
    {
      "id": "claude-3-5-sonnet-20241022",
      "object": "model",
      "owned_by": "anthropic",
      "created": 1234567890
    }
  ]
}

Health Check

curl http://localhost:8080/health

Client library examples

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # API key is configured on the server side
)

# Use OpenAI models
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Or use Anthropic models with the same interface
response = client.chat.completions.create(
    model="claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Streaming works too
stream = client.chat.completions.create(
    model="claude-3-5-haiku-20241022",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

# Embeddings
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello world"
)
print(embedding.data[0].embedding[:5])  # first 5 dimensions

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "not-needed",
});

// Use any supported model - routing is automatic
const response = await client.chat.completions.create({
  model: "gemini-2.0-flash",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: "claude-3-5-haiku-20241022",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});
for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

// Embeddings
const embedding = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "Hello world",
});
console.log(embedding.data[0].embedding.slice(0, 5)); // first 5 dimensions

Available models

OpenAI

gpt-4o - Most capable GPT-4 model
gpt-4o-mini - Fast and efficient GPT-4 model
gpt-4-turbo - Previous generation GPT-4 Turbo
gpt-3.5-turbo - Fast and cost-effective
o1-preview - Advanced reasoning model (preview)
o1-mini - Faster reasoning model

Anthropic

claude-3-5-sonnet-20241022 - Latest Sonnet (best balance of speed and capability)
claude-3-5-haiku-20241022 - Latest Haiku (fastest, most cost-effective)
claude-3-opus-20240229 - Most capable Claude model

Google Gemini

gemini-2.0-flash - Latest Flash model (fast and efficient)
gemini-1.5-pro - Most capable Gemini model (large context window)
gemini-1.5-flash - Previous generation Flash model

xAI

grok-4-1-fast-non-reasoning - Most capable Grok model

Groq

Models are fetched dynamically from the Groq API at startup

Ollama

Models are fetched dynamically from your local Ollama instance at startup

Tips

Model routing: The gateway automatically routes requests to the correct provider based on the model name - no configuration needed. Just use any model name from the list above.
API compatibility: The gateway exposes an OpenAI-compatible API. Existing OpenAI client libraries work unchanged for all providers.
Streaming: All providers support streaming. SSE chunks are flushed incrementally, and streaming responses terminate with data: [DONE].
System messages: Anthropic's system message format is handled automatically. Gemini uses Google's OpenAI-compatible endpoint, which also handles system messages natively.
Max tokens: Anthropic requires max_tokens to be set. If not provided, the gateway defaults to 4096. OpenAI and Gemini treat it as optional.
Responses API: The /v1/responses endpoint provides a unified interface across all providers. Providers that do not natively support the Responses API convert requests internally.
Embeddings: The /v1/embeddings endpoint is supported by OpenAI, Gemini, Groq, Z.ai, xAI, and Ollama. Anthropic does not offer embeddings natively.

Uh oh!

FilesExpand file tree

GETTING_STARTED.md

Latest commit

History

GETTING_STARTED.md

File metadata and controls

Getting Started

How configuration is loaded

Where to put the config file

Scenarios

Scenario 1 - env-only, no YAML (simplest deployment)

Scenario 2 - YAML with providers block and per-provider resilience tuning

Scenario 3 - YAML for resilience only, providers from env

Environment variable reference

Common gotchas

API Examples

OpenAI Examples

Basic Chat Completion

Chat Completion with Parameters

Chat Completion with Function Calling

Streaming Response

Anthropic Examples

Basic Chat Completion

Chat Completion with System Message

Streaming Response

Using Claude Opus (Most Capable Model)

Google Gemini Examples

Basic Chat Completion

Chat Completion with Parameters

Streaming Response

xAI Examples

Basic Responses API Request

Responses API Request with Instructions

Streaming Responses

Embeddings

Basic Embedding

Batch Embedding (multiple inputs)

With Custom Dimensions

List Available Models

Health Check

Client library examples

Python

Node.js

Available models

OpenAI

Anthropic

Google Gemini

xAI

Groq

Ollama

Tips