Skip to content

chore(pricing): Update vertex-ai pricing#550

Open
siddharthsambharia-portkey wants to merge 15 commits intomainfrom
pricing-update/vertex-ai
Open

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 15 commits intomainfrom
pricing-update/vertex-ai

Conversation

@siddharthsambharia-portkey
Copy link
Collaborator

@siddharthsambharia-portkey siddharthsambharia-portkey commented Mar 17, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 0
🔄 Models updated (merged) 28

🔄 Updated Models

  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-flash-preview-09-2025
  • gemini-2.5-flash-lite-preview-09-2025
  • gemini-2.0-flash-001
  • gemini-3-pro-preview
  • gemini-3-pro-image-preview
  • gemini-3-flash-preview
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-image-preview
  • gemini-3.1-flash-lite-preview
  • veo-3.1-generate-001
  • veo-3.1-generate-preview
  • veo-3.1-fast-generate-preview
  • veo-3.0-generate-001
  • veo-3.0-fast-generate-001
  • veo-3.0-generate-preview
  • veo-3.0-fast-generate-preview
  • gemini-embedding-2-preview
  • multimodalembedding
  • gpt-oss-120b-maas
  • qwen3-235b-a22b-instruct-2507-maas
  • qwen3-next-80b-a3b-instruct-maas
  • qwen3-next-80b-a3b-thinking-maas
  • deepseek-r1-0528-maas
  • deepseek-v3.1-maas
  • deepseek-v3.2-maas

Model → Pricing Page Mapping

Google – Gemini (text/multimodal)

Model ID Publisher / Section Source Notes
gemini-2.5-pro Google – Gemini 2.5 API Standard: $1.25/$10, cache_write $0.25, cache_read $0.13, batch $0.625/$5, web_search $35/1k, enterprise $45/1k
gemini-2.5-flash Google – Gemini 2.5 API Standard: $0.30/$2.50, cache_write $0.03, cache_read $0.03, batch $0.15/$1.25, image_token $30/1M
gemini-2.5-flash-lite Google – Gemini 2.5 API Standard: $0.10/$0.40, cache_write $0.01, cache_read $0.01, batch $0.05/$0.20
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 API Row matched via strip -preview-09-2025 → gemini-2.5-flash; same pricing
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 API Row matched via strip -preview-09-2025 → gemini-2.5-flash-lite; same pricing
gemini-2.5-flash-image Google – Gemini 2.5 API Image output model: $0.30/$2.50, image_token $30/1M, batch $0.15/$1.25
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 API – price not found No pricing row found on page; added with price 0
gemini-2.0-flash-001 Google – Gemini 2.0 API Standard: $0.15/$0.60, batch $0.075/$0.30, image_token $30/1M, web_search $35/1k
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API Standard: $0.075/$0.30, batch $0.0375/$0.15
gemini-3-pro-preview Google – Gemini 3.0 API $2/$12, cache $0.36/$0.20, batch $1/$1, image_token $120/1M, web_search $14/1k
gemini-3-pro-image-preview Google – Gemini 3.0 API Same row as Gemini 3 Pro (image variant)
gemini-3-flash-preview Google – Gemini 3.0 API $0.50/$3, cache $0.09/$0.05, batch $0.25/$1.5, image_token $60/1M
gemini-3.1-pro-preview Google – Gemini 3.1 API $2/$12, cache $0.36/$0.20, batch $1/$6, image_token $120/1M, web_search $14/1k
gemini-3.1-flash-image-preview Google – Gemini 3.1 API $0.50/$3, cache $0.09/$0.05, batch $0.25/$1.5, image_token $60/1M
gemini-3.1-flash-lite-preview Google – Gemini 3.1 API $0.25/$1.5, cache $0.05/$0.03, batch $0.13/$0.75, web_search $14/1k

Google – Imagen

Model ID Publisher / Section Source Notes
imagen-4.0-ultra-generate-001 Google – Imagen 4 Ultra API Row matched via lookup_variant imagen-4.0-ultra-generate; $0.06/image
imagen-4.0-generate-001 Google – Imagen 4 API Row matched via lookup_variant imagen-4.0-generate; $0.04/image
imagen-4.0-fast-generate-001 Google – Imagen 4 Fast API Row matched via lookup_variant imagen-4.0-fast-generate; $0.02/image
imagen-3.0-generate-002 Google – Imagen 3 API Row matched via lookup_variant imagen-3.0-generate; $0.04/image
imagen-3.0-capability-001 Google – Imagen 3 API Capability model → uses equivalent imagen-3.0-generate pricing; $0.04/image
imagen-3.0-capability-002 Google – Imagen 3 API Capability model → uses equivalent imagen-3.0-generate pricing; $0.04/image

Google – Veo

Model ID Publisher / Section Source Notes
veo-3.1-generate-001 Google – Veo 3.1 API $0.40/sec, default 8s/1 sample
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API $0.15/sec, default 8s/1 sample
veo-3.1-generate-preview Google – Veo 3.1 API Preview alias → same pricing as veo-3.1-generate
veo-3.1-fast-generate-preview Google – Veo 3.1 Fast API Preview alias → same pricing
veo-3.0-generate-001 Google – Veo 3.0 API $0.40/sec; same rate as Veo 3.1
veo-3.0-fast-generate-001 Google – Veo 3.0 Fast API $0.15/sec
veo-3.0-generate-preview Google – Veo 3.0 API Preview alias → same pricing
veo-3.0-fast-generate-preview Google – Veo 3.0 Fast API Preview alias → same pricing
veo-2.0-generate-001 Google – Veo 2.0 API $0.50/sec

Google – Embeddings

Model ID Publisher / Section Source Notes
gemini-embedding-001 Google – Embedding API $0.00015/1K tokens
gemini-embedding-2-preview Google – Embedding API No dedicated row; using Gemini Embedding 001 pricing as same family
text-embedding-005 Google – Embedding API $0.000025/1K tokens
text-multilingual-embedding-002 Google – Embedding API $0.000025/1K tokens
text-embedding-large-exp-03-07 Google – Embedding API Experimental; shares text-embedding-005 pricing $0.000025/1K
textembedding-gecko Google – Embedding API Legacy; uses text-embedding pricing $0.000025/1K
multimodalembedding Google – Embedding API Multimodal: per-image $0.00012, per-video $0.00016

Anthropic – Claude

Model ID Publisher / Section Source Notes
claude-opus-4-6 Anthropic – Claude API Stripped @default; $5/$25, cache_write(5m) $6.25, cache_read $0.5, batch $2.5/$12.5
claude-sonnet-4-6 Anthropic – Claude API Stripped @default; $3/$15, cache_write(5m) $3.75, cache_read $0.3, batch $1.5/$7.5
claude-opus-4-5@20251101 Anthropic – Claude API Pinned version; $5/$25, cache_write(5m) $6.25, cache_read $0.5
claude-opus-4-1@20250805 Anthropic – Claude API Pinned version; $15/$75, cache_write(5m) $18.75, cache_read $1.5
claude-opus-4@20250514 Anthropic – Claude API Pinned version; $15/$75, cache_write(5m) $18.75, cache_read $1.5
claude-sonnet-4-5@20250929 Anthropic – Claude API Pinned version; $3/$15, cache_write(5m) $3.75, cache_read $0.3
claude-sonnet-4@20250514 Anthropic – Claude API Pinned version; $3/$15, cache_write(5m) $3.75, cache_read $0.3
claude-haiku-4-5@20251001 Anthropic – Claude API Pinned version; $1/$5, cache_write(5m) $1.25, cache_read $0.1

OpenAI

Model ID Publisher / Section Source Notes
gpt-oss-120b-maas OpenAI API gpt-oss-120b row; $0.09/$0.36, cache_read $0.007, batch $0.045/$0.18
clip-vit-base-patch32 OpenAI API – excluded Non-generative (vision embedding)
openclip OpenAI API – excluded Non-generative
whisper-large OpenAI API – excluded Audio transcription, not generative inference
gpt-oss OpenAI API – excluded Self-deploy only (has_deploy:true, no -maas suffix)

Meta – Llama

Model ID Publisher / Section Source Notes
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 API Llama 4 Maverick row; $0.35/$1.15, batch $0.175/$0.575
llama-3.3-70b-instruct-maas Meta – Llama 3.3 API Llama 3.3 70B row; $0.72/$0.72, batch $0.36/$0.36
faster-r-cnn Meta API – excluded Non-generative CV (object detection)
retinanet Meta API – excluded Non-generative CV
mask-r-cnn Meta API – excluded Non-generative CV
segment-anything Meta API – excluded Non-generative CV (segmentation)
xlm-roberta-large Meta API – excluded Non-generative NLP
roberta-large Meta API – excluded Non-generative NLP
codellama-7b-hf Meta API – excluded Self-deploy only
llama2 Meta API – excluded Self-deploy only
nllb Meta API – excluded Non-generative (translation)
imagebind Meta API – excluded Non-generative
llama-2-quantized Meta API – excluded Self-deploy only
llama3 Meta API – excluded Self-deploy only
llama-guard Meta API – excluded Safety/guard model
llama4 Meta API – excluded Self-deploy only
llama3_1 Meta API – excluded Self-deploy only
prompt-guard Meta API – excluded Safety/guard model
llama3-2 Meta API – excluded Self-deploy only
llama3-3 Meta API – excluded Self-deploy only
sam3 Meta API – excluded Non-generative CV (segmentation)

AI21

Model ID Publisher / Section Source Notes
jamba-large-1.6 AI21 API – excluded Self-deploy only (has_deploy:true, no -maas suffix); no MaaS pricing row

Qwen

Model ID Publisher / Section Source Notes
qwen3-235b-a22b-instruct-2507-maas Qwen – Qwen3-235B API $0.22/$0.88, cache_read $0.11, batch $0.11/$0.44
qwen3-coder-480b-a35b-instruct-maas Qwen – Qwen3-Coder-480B API $0.22/$1.80, cache_read $0.022, batch $0.11/$0.90
qwen3-next-80b-a3b-instruct-maas Qwen – Qwen3-Next-80B API $0.15/$1.20, cache_read $0.15, batch $0.15/$1.20
qwen3-next-80b-a3b-thinking-maas Qwen – Qwen3-Next-80B Thinking API $0.15/$1.20, cache_read $0.15, batch $0.15/$1.20
qwq Qwen API – excluded Self-deploy only
qwen3 Qwen API – excluded Self-deploy only
qwen3-embedding Qwen API – excluded Self-deploy only
qwen3-5 Qwen API – excluded Self-deploy only
qwen2 Qwen API – excluded Self-deploy only
qwen3-coder-next Qwen API – excluded Self-deploy only
qwen3-coder Qwen API – excluded Self-deploy only
qwen-image Qwen API – excluded Explicit policy exclusion (image gen, not on Vertex AI pricing)
qwen3-next Qwen API – excluded Self-deploy only
qwen3-vl Qwen API – excluded Self-deploy only

Mistral

Model ID Publisher / Section Source Notes
mistral-small-2503 Mistral – Mistral Small 3.1 API $0.10/$0.30
mistral-medium-3 Mistral – Mistral Medium 3 API $0.40/$2.00
codestral-2 Mistral – Codestral 2 API $0.30/$0.90
mistral Mistral API – excluded Self-deploy only (mistral-ai publisher)
mixtral Mistral API – excluded Self-deploy only (mistral-ai publisher)
codestral-2501-self-deploy Mistral API – excluded Self-deploy (name contains self-deploy)
mistral-ocr-2505 Mistral API – excluded OCR model
ministral-3 Mistral API – excluded Self-deploy only
mistral-large-3 Mistral API – excluded Self-deploy only

DeepSeek

Model ID Publisher / Section Source Notes
deepseek-r1-0528-maas DeepSeek – DeepSeek-R1 API $1.35/$5.40, cache_read $0.06, cache_write $0.675, batch $0.675/$2.70
deepseek-v3.1-maas DeepSeek – DeepSeek-V3.1 API $0.60/$1.70, cache_read $0.06, cache_write $0.30, batch $0.30/$0.85
deepseek-v3.2-maas DeepSeek – DeepSeek-V3.2 API $0.56/$1.68, cache_read $0.056, cache_write $0.28, batch $0.28/$0.84
deepseek-r1 DeepSeek API – excluded Self-deploy only
deepseek-v3 DeepSeek API – excluded Self-deploy only
deepseek-ocr-2 DeepSeek API – excluded OCR model
deepseek-v3-1 DeepSeek API – excluded Self-deploy only
deepseek-v3-2 DeepSeek API – excluded Self-deploy only
deepseek-ocr DeepSeek API – excluded OCR model
deepseek-ocr-maas DeepSeek API – excluded OCR model

Kimi / Moonshot

Model ID Publisher / Section Source Notes
kimi-k2-thinking-maas Kimi – Kimi-K2-Thinking API $0.60/$2.50, cache_read $0.06
kimi-k2-5 Kimi API – excluded Self-deploy only
kimi-k2 Kimi API – excluded Self-deploy only

MiniMax

Model ID Publisher / Section Source Notes
minimax-m2-maas MiniMax – MiniMax-M2 API $0.30/$1.20, cache_read $0.03
minimax-m2 MiniMax API – excluded Self-deploy only

ZAI-org / GLM

Model ID Publisher / Section Source Notes
glm-4.7-maas ZAI-org – GLM-4.7 API $0.60/$2.20
glm-5-maas ZAI-org – GLM-5 API $1.00/$3.20
glm-4.7 ZAI-org API – excluded Self-deploy only
glm-5 ZAI-org API – excluded Self-deploy only
glm-ocr ZAI-org API – excluded OCR model
glm-4.5 ZAI-org API – excluded Self-deploy only
glm-image ZAI-org API – excluded Explicit policy exclusion (image gen, not on Vertex AI pricing)

Generated by Pricing Agent on 2026-03-24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant