chore(pricing): Update vertex-ai pricing by siddharthsambharia-portkey · Pull Request #550 · Portkey-AI/models

siddharthsambharia-portkey · 2026-03-17T12:15:04Z

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type	Count
➕ Models added	0
🔄 Models updated (merged)	28

🔄 Updated Models

gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.5-flash-preview-09-2025
gemini-2.5-flash-lite-preview-09-2025
gemini-2.0-flash-001
gemini-3-pro-preview
gemini-3-pro-image-preview
gemini-3-flash-preview
gemini-3.1-pro-preview
gemini-3.1-flash-image-preview
gemini-3.1-flash-lite-preview
veo-3.1-generate-001
veo-3.1-generate-preview
veo-3.1-fast-generate-preview
veo-3.0-generate-001
veo-3.0-fast-generate-001
veo-3.0-generate-preview
veo-3.0-fast-generate-preview
gemini-embedding-2-preview
multimodalembedding
gpt-oss-120b-maas
qwen3-235b-a22b-instruct-2507-maas
qwen3-next-80b-a3b-instruct-maas
qwen3-next-80b-a3b-thinking-maas
deepseek-r1-0528-maas
deepseek-v3.1-maas
deepseek-v3.2-maas

Model → Pricing Page Mapping

Google – Gemini (text/multimodal)

Model ID	Publisher / Section	Source	Notes
`gemini-2.5-pro`	Google – Gemini 2.5	API	Standard: $1.25/$10, cache_write $0.25, cache_read $0.13, batch $0.625/$5, web_search $35/1k, enterprise $45/1k
`gemini-2.5-flash`	Google – Gemini 2.5	API	Standard: $0.30/$2.50, cache_write $0.03, cache_read $0.03, batch $0.15/$1.25, image_token $30/1M
`gemini-2.5-flash-lite`	Google – Gemini 2.5	API	Standard: $0.10/$0.40, cache_write $0.01, cache_read $0.01, batch $0.05/$0.20
`gemini-2.5-flash-preview-09-2025`	Google – Gemini 2.5	API	Row matched via strip -preview-09-2025 → gemini-2.5-flash; same pricing
`gemini-2.5-flash-lite-preview-09-2025`	Google – Gemini 2.5	API	Row matched via strip -preview-09-2025 → gemini-2.5-flash-lite; same pricing
`gemini-2.5-flash-image`	Google – Gemini 2.5	API	Image output model: $0.30/$2.50, image_token $30/1M, batch $0.15/$1.25
`gemini-2.5-computer-use-preview-10-2025`	Google – Gemini 2.5	API – price not found	No pricing row found on page; added with price 0
`gemini-2.0-flash-001`	Google – Gemini 2.0	API	Standard: $0.15/$0.60, batch $0.075/$0.30, image_token $30/1M, web_search $35/1k
`gemini-2.0-flash-lite-001`	Google – Gemini 2.0	API	Standard: $0.075/$0.30, batch $0.0375/$0.15
`gemini-3-pro-preview`	Google – Gemini 3.0	API	$2/$12, cache $0.36/$0.20, batch $1/$1, image_token $120/1M, web_search $14/1k
`gemini-3-pro-image-preview`	Google – Gemini 3.0	API	Same row as Gemini 3 Pro (image variant)
`gemini-3-flash-preview`	Google – Gemini 3.0	API	$0.50/$3, cache $0.09/$0.05, batch $0.25/$1.5, image_token $60/1M
`gemini-3.1-pro-preview`	Google – Gemini 3.1	API	$2/$12, cache $0.36/$0.20, batch $1/$6, image_token $120/1M, web_search $14/1k
`gemini-3.1-flash-image-preview`	Google – Gemini 3.1	API	$0.50/$3, cache $0.09/$0.05, batch $0.25/$1.5, image_token $60/1M
`gemini-3.1-flash-lite-preview`	Google – Gemini 3.1	API	$0.25/$1.5, cache $0.05/$0.03, batch $0.13/$0.75, web_search $14/1k

Google – Imagen

Model ID	Publisher / Section	Source	Notes
`imagen-4.0-ultra-generate-001`	Google – Imagen 4 Ultra	API	Row matched via lookup_variant imagen-4.0-ultra-generate; $0.06/image
`imagen-4.0-generate-001`	Google – Imagen 4	API	Row matched via lookup_variant imagen-4.0-generate; $0.04/image
`imagen-4.0-fast-generate-001`	Google – Imagen 4 Fast	API	Row matched via lookup_variant imagen-4.0-fast-generate; $0.02/image
`imagen-3.0-generate-002`	Google – Imagen 3	API	Row matched via lookup_variant imagen-3.0-generate; $0.04/image
`imagen-3.0-capability-001`	Google – Imagen 3	API	Capability model → uses equivalent imagen-3.0-generate pricing; $0.04/image
`imagen-3.0-capability-002`	Google – Imagen 3	API	Capability model → uses equivalent imagen-3.0-generate pricing; $0.04/image

Google – Veo

Model ID	Publisher / Section	Source	Notes
`veo-3.1-generate-001`	Google – Veo 3.1	API	$0.40/sec, default 8s/1 sample
`veo-3.1-fast-generate-001`	Google – Veo 3.1 Fast	API	$0.15/sec, default 8s/1 sample
`veo-3.1-generate-preview`	Google – Veo 3.1	API	Preview alias → same pricing as veo-3.1-generate
`veo-3.1-fast-generate-preview`	Google – Veo 3.1 Fast	API	Preview alias → same pricing
`veo-3.0-generate-001`	Google – Veo 3.0	API	$0.40/sec; same rate as Veo 3.1
`veo-3.0-fast-generate-001`	Google – Veo 3.0 Fast	API	$0.15/sec
`veo-3.0-generate-preview`	Google – Veo 3.0	API	Preview alias → same pricing
`veo-3.0-fast-generate-preview`	Google – Veo 3.0 Fast	API	Preview alias → same pricing
`veo-2.0-generate-001`	Google – Veo 2.0	API	$0.50/sec

Google – Embeddings

Model ID	Publisher / Section	Source	Notes
`gemini-embedding-001`	Google – Embedding	API	$0.00015/1K tokens
`gemini-embedding-2-preview`	Google – Embedding	API	No dedicated row; using Gemini Embedding 001 pricing as same family
`text-embedding-005`	Google – Embedding	API	$0.000025/1K tokens
`text-multilingual-embedding-002`	Google – Embedding	API	$0.000025/1K tokens
`text-embedding-large-exp-03-07`	Google – Embedding	API	Experimental; shares text-embedding-005 pricing $0.000025/1K
`textembedding-gecko`	Google – Embedding	API	Legacy; uses text-embedding pricing $0.000025/1K
`multimodalembedding`	Google – Embedding	API	Multimodal: per-image $0.00012, per-video $0.00016

Anthropic – Claude

Model ID	Publisher / Section	Source	Notes
`claude-opus-4-6`	Anthropic – Claude	API	Stripped @default; $5/$25, cache_write(5m) $6.25, cache_read $0.5, batch $2.5/$12.5
`claude-sonnet-4-6`	Anthropic – Claude	API	Stripped @default; $3/$15, cache_write(5m) $3.75, cache_read $0.3, batch $1.5/$7.5
`claude-opus-4-5@20251101`	Anthropic – Claude	API	Pinned version; $5/$25, cache_write(5m) $6.25, cache_read $0.5
`claude-opus-4-1@20250805`	Anthropic – Claude	API	Pinned version; $15/$75, cache_write(5m) $18.75, cache_read $1.5
`claude-opus-4@20250514`	Anthropic – Claude	API	Pinned version; $15/$75, cache_write(5m) $18.75, cache_read $1.5
`claude-sonnet-4-5@20250929`	Anthropic – Claude	API	Pinned version; $3/$15, cache_write(5m) $3.75, cache_read $0.3
`claude-sonnet-4@20250514`	Anthropic – Claude	API	Pinned version; $3/$15, cache_write(5m) $3.75, cache_read $0.3
`claude-haiku-4-5@20251001`	Anthropic – Claude	API	Pinned version; $1/$5, cache_write(5m) $1.25, cache_read $0.1

OpenAI

Model ID	Publisher / Section	Source	Notes
`gpt-oss-120b-maas`	OpenAI	API	gpt-oss-120b row; $0.09/$0.36, cache_read $0.007, batch $0.045/$0.18
`clip-vit-base-patch32`	OpenAI	API – excluded	Non-generative (vision embedding)
`openclip`	OpenAI	API – excluded	Non-generative
`whisper-large`	OpenAI	API – excluded	Audio transcription, not generative inference
`gpt-oss`	OpenAI	API – excluded	Self-deploy only (has_deploy:true, no -maas suffix)

Meta – Llama

Model ID	Publisher / Section	Source	Notes
`llama-4-maverick-17b-128e-instruct-maas`	Meta – Llama 4	API	Llama 4 Maverick row; $0.35/$1.15, batch $0.175/$0.575
`llama-3.3-70b-instruct-maas`	Meta – Llama 3.3	API	Llama 3.3 70B row; $0.72/$0.72, batch $0.36/$0.36
`faster-r-cnn`	Meta	API – excluded	Non-generative CV (object detection)
`retinanet`	Meta	API – excluded	Non-generative CV
`mask-r-cnn`	Meta	API – excluded	Non-generative CV
`segment-anything`	Meta	API – excluded	Non-generative CV (segmentation)
`xlm-roberta-large`	Meta	API – excluded	Non-generative NLP
`roberta-large`	Meta	API – excluded	Non-generative NLP
`codellama-7b-hf`	Meta	API – excluded	Self-deploy only
`llama2`	Meta	API – excluded	Self-deploy only
`nllb`	Meta	API – excluded	Non-generative (translation)
`imagebind`	Meta	API – excluded	Non-generative
`llama-2-quantized`	Meta	API – excluded	Self-deploy only
`llama3`	Meta	API – excluded	Self-deploy only
`llama-guard`	Meta	API – excluded	Safety/guard model
`llama4`	Meta	API – excluded	Self-deploy only
`llama3_1`	Meta	API – excluded	Self-deploy only
`prompt-guard`	Meta	API – excluded	Safety/guard model
`llama3-2`	Meta	API – excluded	Self-deploy only
`llama3-3`	Meta	API – excluded	Self-deploy only
`sam3`	Meta	API – excluded	Non-generative CV (segmentation)

AI21

Model ID	Publisher / Section	Source	Notes
`jamba-large-1.6`	AI21	API – excluded	Self-deploy only (has_deploy:true, no -maas suffix); no MaaS pricing row

Qwen

Model ID	Publisher / Section	Source	Notes
`qwen3-235b-a22b-instruct-2507-maas`	Qwen – Qwen3-235B	API	$0.22/$0.88, cache_read $0.11, batch $0.11/$0.44
`qwen3-coder-480b-a35b-instruct-maas`	Qwen – Qwen3-Coder-480B	API	$0.22/$1.80, cache_read $0.022, batch $0.11/$0.90
`qwen3-next-80b-a3b-instruct-maas`	Qwen – Qwen3-Next-80B	API	$0.15/$1.20, cache_read $0.15, batch $0.15/$1.20
`qwen3-next-80b-a3b-thinking-maas`	Qwen – Qwen3-Next-80B Thinking	API	$0.15/$1.20, cache_read $0.15, batch $0.15/$1.20
`qwq`	Qwen	API – excluded	Self-deploy only
`qwen3`	Qwen	API – excluded	Self-deploy only
`qwen3-embedding`	Qwen	API – excluded	Self-deploy only
`qwen3-5`	Qwen	API – excluded	Self-deploy only
`qwen2`	Qwen	API – excluded	Self-deploy only
`qwen3-coder-next`	Qwen	API – excluded	Self-deploy only
`qwen3-coder`	Qwen	API – excluded	Self-deploy only
`qwen-image`	Qwen	API – excluded	Explicit policy exclusion (image gen, not on Vertex AI pricing)
`qwen3-next`	Qwen	API – excluded	Self-deploy only
`qwen3-vl`	Qwen	API – excluded	Self-deploy only

Mistral

Model ID	Publisher / Section	Source	Notes
`mistral-small-2503`	Mistral – Mistral Small 3.1	API	$0.10/$0.30
`mistral-medium-3`	Mistral – Mistral Medium 3	API	$0.40/$2.00
`codestral-2`	Mistral – Codestral 2	API	$0.30/$0.90
`mistral`	Mistral	API – excluded	Self-deploy only (mistral-ai publisher)
`mixtral`	Mistral	API – excluded	Self-deploy only (mistral-ai publisher)
`codestral-2501-self-deploy`	Mistral	API – excluded	Self-deploy (name contains self-deploy)
`mistral-ocr-2505`	Mistral	API – excluded	OCR model
`ministral-3`	Mistral	API – excluded	Self-deploy only
`mistral-large-3`	Mistral	API – excluded	Self-deploy only

DeepSeek

Model ID	Publisher / Section	Source	Notes
`deepseek-r1-0528-maas`	DeepSeek – DeepSeek-R1	API	$1.35/$5.40, cache_read $0.06, cache_write $0.675, batch $0.675/$2.70
`deepseek-v3.1-maas`	DeepSeek – DeepSeek-V3.1	API	$0.60/$1.70, cache_read $0.06, cache_write $0.30, batch $0.30/$0.85
`deepseek-v3.2-maas`	DeepSeek – DeepSeek-V3.2	API	$0.56/$1.68, cache_read $0.056, cache_write $0.28, batch $0.28/$0.84
`deepseek-r1`	DeepSeek	API – excluded	Self-deploy only
`deepseek-v3`	DeepSeek	API – excluded	Self-deploy only
`deepseek-ocr-2`	DeepSeek	API – excluded	OCR model
`deepseek-v3-1`	DeepSeek	API – excluded	Self-deploy only
`deepseek-v3-2`	DeepSeek	API – excluded	Self-deploy only
`deepseek-ocr`	DeepSeek	API – excluded	OCR model
`deepseek-ocr-maas`	DeepSeek	API – excluded	OCR model

Kimi / Moonshot

Model ID	Publisher / Section	Source	Notes
`kimi-k2-thinking-maas`	Kimi – Kimi-K2-Thinking	API	$0.60/$2.50, cache_read $0.06
`kimi-k2-5`	Kimi	API – excluded	Self-deploy only
`kimi-k2`	Kimi	API – excluded	Self-deploy only

MiniMax

Model ID	Publisher / Section	Source	Notes
`minimax-m2-maas`	MiniMax – MiniMax-M2	API	$0.30/$1.20, cache_read $0.03
`minimax-m2`	MiniMax	API – excluded	Self-deploy only

ZAI-org / GLM

Model ID	Publisher / Section	Source	Notes
`glm-4.7-maas`	ZAI-org – GLM-4.7	API	$0.60/$2.20
`glm-5-maas`	ZAI-org – GLM-5	API	$1.00/$3.20
`glm-4.7`	ZAI-org	API – excluded	Self-deploy only
`glm-5`	ZAI-org	API – excluded	Self-deploy only
`glm-ocr`	ZAI-org	API – excluded	OCR model
`glm-4.5`	ZAI-org	API – excluded	Self-deploy only
`glm-image`	ZAI-org	API – excluded	Explicit policy exclusion (image gen, not on Vertex AI pricing)

Generated by Pricing Agent on 2026-03-24

siddharthsambharia-portkey added 15 commits March 17, 2026 17:45

chore(pricing): Update vertex-ai pricing

a1a3f5f

chore(pricing): Update vertex-ai pricing

53b3f5d

chore(pricing): Update vertex-ai pricing

52dbf8e

chore(pricing): Update vertex-ai pricing

f19c6a3

chore(pricing): Update vertex-ai pricing

a6e1035

chore(pricing): Update vertex-ai pricing

91c6f2a

chore(pricing): Update vertex-ai pricing

d32f719

chore(pricing): Update vertex-ai pricing

6a7c7e8

chore(pricing): Update vertex-ai pricing

916ddaf

chore(pricing): Update vertex-ai pricing

fa02c68

chore(pricing): Update vertex-ai pricing

7320d33

chore(pricing): Update vertex-ai pricing

3604db1

chore(pricing): Update vertex-ai pricing

d31b801

chore(pricing): Update vertex-ai pricing

a267566

chore(pricing): Update vertex-ai pricing

04933eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pricing): Update vertex-ai pricing#550

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 15 commits intomainfrom
pricing-update/vertex-ai

siddharthsambharia-portkey commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

siddharthsambharia-portkey commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

🔄 Updated Models

Model → Pricing Page Mapping

Google – Gemini (text/multimodal)

Google – Imagen

Google – Veo

Google – Embeddings

Anthropic – Claude

OpenAI

Meta – Llama

AI21

Qwen

Mistral

DeepSeek

Kimi / Moonshot

MiniMax

ZAI-org / GLM

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

siddharthsambharia-portkey commented Mar 17, 2026 •

edited

Loading