Skip to content

Latest commit

 

History

History
361 lines (292 loc) · 9.23 KB

File metadata and controls

361 lines (292 loc) · 9.23 KB

API Reference

Table of Contents

Overview

The Gemini-CLI-Proxy provides OpenAI-compatible REST API endpoints for interacting with Google Gemini models. All endpoints follow OpenAI API v1 specifications for maximum compatibility.

Quick Start

To quickly test the API, you can use curl to make a chat completion request. Replace https://your-worker-domain with your deployed Cloudflare Worker URL.

curl -X POST https://your-worker-domain/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gemini-2.5-pro",
    "messages": [{"role": "user", "content": "Hello, how are you?"}]
  }'

Base URL

https://your-worker-domain

Authentication

Include your API key in the Authorization header. Note that YOUR_API_KEY is a placeholder; the actual authentication relies on the Gemini OAuth credentials or AI Studio API keys configured in your deployment.

Authorization: Bearer YOUR_API_KEY

Endpoints

GET /

Returns basic information about the service.

Response:

{
  "name": "Gemini CLI OpenAI Worker",
  "description": "OpenAI-compatible API for Google Gemini models via OAuth",
  "version": "1.0.0",
  "authentication": {
    "required": true,
    "type": "Bearer token in Authorization header"
  },
  "endpoints": {
    "chat_completions": "/v1/chat/completions",
    "models": "/v1/models",
    "debug": {
      "cache": "/v1/debug/cache",
      "token_test": "/v1/token-test",
      "full_test": "/v1/test"
    }
  },
  "documentation": "https://github.com/soficis/gemini-cli-proxy"
}

GET /health

Health check endpoint to verify service status.

Response:

{
  "status": "ok",
  "timestamp": "2025-09-11T06:00:00.000Z"
}

GET /v1/models

Lists all available Gemini models.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gemini-2.5-pro",
      "object": "model",
      "created": 1726070400,
      "owned_by": "google"
    },
    {
      "id": "gemini-2.5-flash",
      "object": "model",
      "created": 1726070400,
      "owned_by": "google"
    }
  ]
}

POST /v1/chat/completions

Creates a chat completion using the specified Gemini model.

Request Body:

{
  "model": "gemini-2.5-pro",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "stream": false,
  "max_tokens": 1000,
  "temperature": 0.7,
  "top_p": 1.0,
  "stop": null,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "seed": null,
  "response_format": null,
  "thinking_budget": 16384,
  "reasoning_effort": "medium",
  "tools": null,
  "tool_choice": null
}

Parameters:

Parameter Type Required Description
model string Yes The Gemini model to use
messages array Yes Array of message objects
stream boolean No Enable streaming response (default: true)
max_tokens integer No Maximum tokens to generate
temperature number No Sampling temperature (0.0 to 2.0)
top_p number No Nucleus sampling parameter
stop string/array No Stop sequences
presence_penalty number No Presence penalty
frequency_penalty number No Frequency penalty
seed integer No Random seed for reproducibility
response_format object No Response format specification
thinking_budget integer No Reasoning budget for thinking models
reasoning_effort string No Reasoning effort level ("low", "medium", "high", "none")
tools array No Function calling tools
tool_choice string/object No Tool choice specification

Response (Non-streaming):

{
  "id": "chatcmpl-12345678",
  "object": "chat.completion",
  "created": 1726070400,
  "model": "gemini-2.5-pro",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Response (Streaming):

data: {"id":"chatcmpl-12345678","object":"chat.completion.chunk","created":1726070400,"model":"gemini-2.5-pro","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-12345678","object":"chat.completion.chunk","created":1726070400,"model":"gemini-2.5-pro","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: [DONE]

Supported Models

Model ID Description Vision Support Reasoning Support
gemini-2.5-pro Latest Pro model
gemini-2.5-flash Fast Flash model
gemini-2.0-flash-exp Experimental Flash
gemini-1.5-pro Previous Pro model
gemini-1.5-flash Previous Flash model

Error Responses

All errors follow OpenAI-compatible format:

{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "code": "error_code"
  }
}

Common Error Codes

  • 400 - Bad Request (invalid parameters)
  • 401 - Unauthorized (invalid API key)
  • 403 - Forbidden (insufficient permissions)
  • 429 - Too Many Requests (rate limited)
  • 500 - Internal Server Error

Rate Limiting

The service implements intelligent rate limiting:

  • Automatic cooldown periods for rate-limited credentials
  • Round-robin distribution across available credentials
  • Failover to backup credentials when primary pool is exhausted

Streaming

Streaming responses use Server-Sent Events (SSE) format:

  • Content-Type: text/event-stream
  • Each line starts with data:
  • Stream ends with data: [DONE]

Tool Calling

Supports function calling with tools parameter:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Reasoning

Enhanced reasoning support for thinking models:

  • thinking_budget: Controls reasoning token allocation
  • reasoning_effort: Preset effort levels ("low", "medium", "high")
  • Automatic reasoning enablement for compatible models

Image Support

Vision-capable models support image inputs:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/image.jpg"
          }
        }
      ]
    }
  ]
}

Debug Endpoints

GET /v1/debug/cache

Returns cache status information.

GET /v1/token-test

Tests token validity.

GET /v1/test

Runs full system test.

SDK Compatibility

Compatible with OpenAI SDK:

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'your-api-key',
  baseURL: 'https://your-worker-domain'
});

const completion = await client.chat.completions.create({
  model: 'gemini-2.5-pro',
  messages: [{ role: 'user', content: 'Hello!' }]
});

Limitations

  • Maximum context length varies by model.
  • Rate limits depend on credential pool configuration.
  • Some advanced OpenAI features may not be fully supported.
  • Streaming requires proper SSE handling in client applications.

Best Practices

  • Secure API Keys: Never hardcode API keys in your application. Use environment variables or a secure configuration management system.
  • Error Handling: Implement robust error handling to gracefully manage API errors, rate limits, and network issues.
  • Asynchronous Operations: Use asynchronous programming patterns to avoid blocking your application while waiting for API responses.
  • Retry Mechanisms: Implement retry logic with exponential backoff for transient errors (e.g., rate limits, network timeouts).
  • Monitor Usage: Regularly monitor your API usage to stay within rate limits and manage costs.
  • Credential Rotation: For enhanced security, rotate your Gemini OAuth credentials and AI Studio API keys periodically.
  • Optimize Prompts: Design concise and effective prompts to minimize token usage and improve response quality.
  • Model Selection: Choose the appropriate model (gemini-2.5-pro, gemini-2.5-flash, etc.) based on your specific task requirements and cost considerations.