- Overview
- Quick Start
- Base URL
- Authentication
- Endpoints
- Supported Models
- Error Responses
- Rate Limiting
- Streaming
- Tool Calling
- Reasoning
- Image Support
- Debug Endpoints
- SDK Compatibility
- Limitations
- Best Practices
The Gemini-CLI-Proxy provides OpenAI-compatible REST API endpoints for interacting with Google Gemini models. All endpoints follow OpenAI API v1 specifications for maximum compatibility.
To quickly test the API, you can use curl to make a chat completion request. Replace https://your-worker-domain with your deployed Cloudflare Worker URL.
curl -X POST https://your-worker-domain/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gemini-2.5-pro",
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'https://your-worker-domain
Include your API key in the Authorization header. Note that YOUR_API_KEY is a placeholder; the actual authentication relies on the Gemini OAuth credentials or AI Studio API keys configured in your deployment.
Authorization: Bearer YOUR_API_KEY
Returns basic information about the service.
Response:
{
"name": "Gemini CLI OpenAI Worker",
"description": "OpenAI-compatible API for Google Gemini models via OAuth",
"version": "1.0.0",
"authentication": {
"required": true,
"type": "Bearer token in Authorization header"
},
"endpoints": {
"chat_completions": "/v1/chat/completions",
"models": "/v1/models",
"debug": {
"cache": "/v1/debug/cache",
"token_test": "/v1/token-test",
"full_test": "/v1/test"
}
},
"documentation": "https://github.com/soficis/gemini-cli-proxy"
}Health check endpoint to verify service status.
Response:
{
"status": "ok",
"timestamp": "2025-09-11T06:00:00.000Z"
}Lists all available Gemini models.
Response:
{
"object": "list",
"data": [
{
"id": "gemini-2.5-pro",
"object": "model",
"created": 1726070400,
"owned_by": "google"
},
{
"id": "gemini-2.5-flash",
"object": "model",
"created": 1726070400,
"owned_by": "google"
}
]
}Creates a chat completion using the specified Gemini model.
Request Body:
{
"model": "gemini-2.5-pro",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"stream": false,
"max_tokens": 1000,
"temperature": 0.7,
"top_p": 1.0,
"stop": null,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"seed": null,
"response_format": null,
"thinking_budget": 16384,
"reasoning_effort": "medium",
"tools": null,
"tool_choice": null
}Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The Gemini model to use |
messages |
array | Yes | Array of message objects |
stream |
boolean | No | Enable streaming response (default: true) |
max_tokens |
integer | No | Maximum tokens to generate |
temperature |
number | No | Sampling temperature (0.0 to 2.0) |
top_p |
number | No | Nucleus sampling parameter |
stop |
string/array | No | Stop sequences |
presence_penalty |
number | No | Presence penalty |
frequency_penalty |
number | No | Frequency penalty |
seed |
integer | No | Random seed for reproducibility |
response_format |
object | No | Response format specification |
thinking_budget |
integer | No | Reasoning budget for thinking models |
reasoning_effort |
string | No | Reasoning effort level ("low", "medium", "high", "none") |
tools |
array | No | Function calling tools |
tool_choice |
string/object | No | Tool choice specification |
Response (Non-streaming):
{
"id": "chatcmpl-12345678",
"object": "chat.completion",
"created": 1726070400,
"model": "gemini-2.5-pro",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}Response (Streaming):
data: {"id":"chatcmpl-12345678","object":"chat.completion.chunk","created":1726070400,"model":"gemini-2.5-pro","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-12345678","object":"chat.completion.chunk","created":1726070400,"model":"gemini-2.5-pro","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]
| Model ID | Description | Vision Support | Reasoning Support |
|---|---|---|---|
gemini-2.5-pro |
Latest Pro model | ✅ | ✅ |
gemini-2.5-flash |
Fast Flash model | ✅ | ✅ |
gemini-2.0-flash-exp |
Experimental Flash | ✅ | ❌ |
gemini-1.5-pro |
Previous Pro model | ✅ | ❌ |
gemini-1.5-flash |
Previous Flash model | ✅ | ❌ |
All errors follow OpenAI-compatible format:
{
"error": {
"message": "Error description",
"type": "error_type",
"code": "error_code"
}
}400- Bad Request (invalid parameters)401- Unauthorized (invalid API key)403- Forbidden (insufficient permissions)429- Too Many Requests (rate limited)500- Internal Server Error
The service implements intelligent rate limiting:
- Automatic cooldown periods for rate-limited credentials
- Round-robin distribution across available credentials
- Failover to backup credentials when primary pool is exhausted
Streaming responses use Server-Sent Events (SSE) format:
- Content-Type:
text/event-stream - Each line starts with
data: - Stream ends with
data: [DONE]
Supports function calling with tools parameter:
{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}Enhanced reasoning support for thinking models:
thinking_budget: Controls reasoning token allocationreasoning_effort: Preset effort levels ("low", "medium", "high")- Automatic reasoning enablement for compatible models
Vision-capable models support image inputs:
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}Returns cache status information.
Tests token validity.
Runs full system test.
Compatible with OpenAI SDK:
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-api-key',
baseURL: 'https://your-worker-domain'
});
const completion = await client.chat.completions.create({
model: 'gemini-2.5-pro',
messages: [{ role: 'user', content: 'Hello!' }]
});- Maximum context length varies by model.
- Rate limits depend on credential pool configuration.
- Some advanced OpenAI features may not be fully supported.
- Streaming requires proper SSE handling in client applications.
- Secure API Keys: Never hardcode API keys in your application. Use environment variables or a secure configuration management system.
- Error Handling: Implement robust error handling to gracefully manage API errors, rate limits, and network issues.
- Asynchronous Operations: Use asynchronous programming patterns to avoid blocking your application while waiting for API responses.
- Retry Mechanisms: Implement retry logic with exponential backoff for transient errors (e.g., rate limits, network timeouts).
- Monitor Usage: Regularly monitor your API usage to stay within rate limits and manage costs.
- Credential Rotation: For enhanced security, rotate your Gemini OAuth credentials and AI Studio API keys periodically.
- Optimize Prompts: Design concise and effective prompts to minimize token usage and improve response quality.
- Model Selection: Choose the appropriate model (
gemini-2.5-pro,gemini-2.5-flash, etc.) based on your specific task requirements and cost considerations.