diff --git a/skills/together-batch-inference/references/api-reference.md b/skills/together-batch-inference/references/api-reference.md index 65f509b..5b240c1 100644 --- a/skills/together-batch-inference/references/api-reference.md +++ b/skills/together-batch-inference/references/api-reference.md @@ -240,8 +240,17 @@ curl -X GET "https://api.together.xyz/v1/batches" \ - `meta-llama/Llama-3.3-70B-Instruct-Turbo` - -All serverless models support batch processing — models not listed have no discount. The 50% discount does not apply to dedicated endpoint usage. +Most serverless models support batch processing through the chat completions endpoint; models not listed above have no discount. The following serverless models are not currently available for batch and will fail if submitted: + +- `deepseek-ai/DeepSeek-R1` +- `deepseek-ai/DeepSeek-V3.1` +- `deepseek-ai/DeepSeek-V4-Pro` +- `MiniMaxAI/MiniMax-M2.7` +- `moonshotai/Kimi-K2.5` +- `moonshotai/Kimi-K2.6` +- `Qwen/Qwen3.5-397B-A17B` +- `zai-org/GLM-5` +- `zai-org/GLM-5.1` ## Rate Limits