API Reference

Overview
Quick Start
Base URL
Authentication
Endpoints
Supported Models
Error Responses
Rate Limiting
Streaming
Tool Calling
Reasoning
Image Support
Debug Endpoints
SDK Compatibility
Limitations
Best Practices

Overview

The Gemini-CLI-Proxy provides OpenAI-compatible REST API endpoints for interacting with Google Gemini models. All endpoints follow OpenAI API v1 specifications for maximum compatibility.

Quick Start

To quickly test the API, you can use curl to make a chat completion request. Replace https://your-worker-domain with your deployed Cloudflare Worker URL.

curl -X POST https://your-worker-domain/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
  }'

Base URL

https://your-worker-domain

Authentication

Include your API key in the Authorization header. Note that YOUR_API_KEY is a placeholder; the actual authentication relies on the Gemini OAuth credentials or AI Studio API keys configured in your deployment.

Authorization: Bearer YOUR_API_KEY

Endpoints

GET /

Returns basic information about the service.

Response:

{
  "name": "Gemini CLI OpenAI Worker",
  "description": "OpenAI-compatible API for Google Gemini models via OAuth",
  "version": "1.0.0",
  "authentication": {
    "required": true,
    "type": "Bearer token in Authorization header"
  },
  "endpoints": {
    "chat_completions": "/v1/chat/completions",
    "models": "/v1/models",
    "debug": {
      "cache": "/v1/debug/cache",
      "token_test": "/v1/token-test",
      "full_test": "/v1/test"
    }
  },
  "documentation": "https://github.com/soficis/gemini-cli-proxy"
}

GET /health

Health check endpoint to verify service status.

Response:

{
  "status": "ok",
  "timestamp": "2025-09-11T06:00:00.000Z"
}

GET /v1/models

Lists all available Gemini models.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gemini-2.5-pro",
      "object": "model",
      "created": 1726070400,
      "owned_by": "google"
    },
    {
      "id": "gemini-2.5-flash",
      "object": "model",
      "created": 1726070400,
      "owned_by": "google"
    }
  ]
}

POST /v1/chat/completions

Creates a chat completion using the specified Gemini model.

Request Body:

{
  "model": "gemini-2.5-pro",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "stream": false,
  "max_tokens": 1000,
  "temperature": 0.7,
  "top_p": 1.0,
  "stop": null,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "seed": null,
  "response_format": null,
  "thinking_budget": 16384,
  "reasoning_effort": "medium",
  "tools": null,
  "tool_choice": null
}

Parameters:

Parameter	Type	Required	Description
`model`	string	Yes	The Gemini model to use
`messages`	array	Yes	Array of message objects
`stream`	boolean	No	Enable streaming response (default: true)
`max_tokens`	integer	No	Maximum tokens to generate
`temperature`	number	No	Sampling temperature (0.0 to 2.0)
`top_p`	number	No	Nucleus sampling parameter
`stop`	string/array	No	Stop sequences
`presence_penalty`	number	No	Presence penalty
`frequency_penalty`	number	No	Frequency penalty
`seed`	integer	No	Random seed for reproducibility
`response_format`	object	No	Response format specification
`thinking_budget`	integer	No	Reasoning budget for thinking models
`reasoning_effort`	string	No	Reasoning effort level ("low", "medium", "high", "none")
`tools`	array	No	Function calling tools
`tool_choice`	string/object	No	Tool choice specification

Response (Non-streaming):

{
  "id": "chatcmpl-12345678",
  "object": "chat.completion",
  "created": 1726070400,
  "model": "gemini-2.5-pro",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Response (Streaming):

data: {"id":"chatcmpl-12345678","object":"chat.completion.chunk","created":1726070400,"model":"gemini-2.5-pro","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-12345678","object":"chat.completion.chunk","created":1726070400,"model":"gemini-2.5-pro","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

Supported Models

Model ID	Description	Vision Support	Reasoning Support
`gemini-2.5-pro`	Latest Pro model	✅	✅
`gemini-2.5-flash`	Fast Flash model	✅	✅
`gemini-2.0-flash-exp`	Experimental Flash	✅	❌
`gemini-1.5-pro`	Previous Pro model	✅	❌
`gemini-1.5-flash`	Previous Flash model	✅	❌

Error Responses

All errors follow OpenAI-compatible format:

{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "code": "error_code"
  }
}

Common Error Codes

400 - Bad Request (invalid parameters)
401 - Unauthorized (invalid API key)
403 - Forbidden (insufficient permissions)
429 - Too Many Requests (rate limited)
500 - Internal Server Error

Rate Limiting

The service implements intelligent rate limiting:

Automatic cooldown periods for rate-limited credentials
Round-robin distribution across available credentials
Failover to backup credentials when primary pool is exhausted

Streaming

Streaming responses use Server-Sent Events (SSE) format:

Content-Type: text/event-stream
Each line starts with data:
Stream ends with data: [DONE]

Tool Calling

Supports function calling with tools parameter:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Reasoning

Enhanced reasoning support for thinking models:

thinking_budget: Controls reasoning token allocation
reasoning_effort: Preset effort levels ("low", "medium", "high")
Automatic reasoning enablement for compatible models

Image Support

Vision-capable models support image inputs:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Debug Endpoints

GET /v1/debug/cache

Returns cache status information.

GET /v1/token-test

Tests token validity.

GET /v1/test

Runs full system test.

SDK Compatibility

Compatible with OpenAI SDK:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-api-key',
  baseURL: 'https://your-worker-domain'
});

const completion = await client.chat.completions.create({
  model: 'gemini-2.5-pro',
  messages: [{ role: 'user', content: 'Hello!' }]
});

Limitations

Maximum context length varies by model.
Rate limits depend on credential pool configuration.
Some advanced OpenAI features may not be fully supported.
Streaming requires proper SSE handling in client applications.

Best Practices

Secure API Keys: Never hardcode API keys in your application. Use environment variables or a secure configuration management system.
Error Handling: Implement robust error handling to gracefully manage API errors, rate limits, and network issues.
Asynchronous Operations: Use asynchronous programming patterns to avoid blocking your application while waiting for API responses.
Retry Mechanisms: Implement retry logic with exponential backoff for transient errors (e.g., rate limits, network timeouts).
Monitor Usage: Regularly monitor your API usage to stay within rate limits and manage costs.
Credential Rotation: For enhanced security, rotate your Gemini OAuth credentials and AI Studio API keys periodically.
Optimize Prompts: Design concise and effective prompts to minimize token usage and improve response quality.
Model Selection: Choose the appropriate model (gemini-2.5-pro, gemini-2.5-flash, etc.) based on your specific task requirements and cost considerations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Table of Contents

Overview

Quick Start

Base URL

Authentication

Endpoints

GET /

GET /health

GET /v1/models

POST /v1/chat/completions

Supported Models

Error Responses

Common Error Codes

Rate Limiting

Streaming

Tool Calling

Reasoning

Image Support

Debug Endpoints

GET /v1/debug/cache

GET /v1/token-test

GET /v1/test

SDK Compatibility

Limitations

Best Practices

FilesExpand file tree

API_REFERENCE.md

Latest commit

History

API_REFERENCE.md

File metadata and controls

API Reference

Table of Contents

Overview

Quick Start

Base URL

Authentication

Endpoints

GET /

GET /health

GET /v1/models

POST /v1/chat/completions

Supported Models

Error Responses

Common Error Codes

Rate Limiting

Streaming

Tool Calling

Reasoning

Image Support

Debug Endpoints

GET /v1/debug/cache

GET /v1/token-test

GET /v1/test

SDK Compatibility

Limitations

Best Practices