togethercomputer · zainhas · May 1, 2026 · May 1, 2026 · May 1, 2026 · May 1, 2026
diff --git a/README.md b/README.md
@@ -143,43 +143,6 @@ python skills/together-batch-inference/scripts/batch_workflow.py
 
 Scripts use the **Together Python v2 SDK** (`together>=2.0.0`) with keyword-only arguments, updated method names, and current response shapes.
 
-## Skill Structure
-
-```
-togetherai-skills/
-├── quality/
-│   └── trigger-evals/         # Skill trigger test sets
-├── scripts/                   # Repo tooling, generators, validators
-└── skills/
-    └── together-<product>/
-        ├── SKILL.md           # Core instructions (always loaded on trigger)
-        ├── agents/
-        │   └── openai.yaml    # OpenAI/Codex interface metadata
-        ├── references/        # Detailed docs (loaded when needed)
-        │   ├── models.md      # Supported models, IDs, context lengths
-        │   ├── api-reference.md
-        │   └── ...
-        └── scripts/           # Runnable Python examples (v2 SDK)
-            └── <workflow>.py
-```
-
-### How skills are loaded
-
-1. **Metadata** (YAML frontmatter) — Always available to the agent (~100 words). Used to decide whether to load the skill.
-2. **Body** (Markdown) — Loaded when the skill is triggered. It should stay lean and focus on routing, high-signal rules, and the next resource to open.
-3. **References** — Loaded on demand when the agent needs deeper detail (model lists, full API specs).
-4. **Scripts** — Available as runnable code that the agent can reference or execute directly.
-5. **OpenAI metadata** — `agents/openai.yaml` gives OpenAI/Codex surfaces a display name, short description, and default prompt.
-
-## Quality Guardrails
-
-This repo now treats skills as agent artifacts rather than long tutorials:
-
-- `SKILL.md` files are intentionally short and routing-oriented
-- Long references include a `## Contents` section near the top
-- Each skill has trigger eval examples in `quality/trigger-evals/`
-- Multi-step Python workflows are validated for current v2 SDK usage and safer tempfile handling
-
 ## SDK Compatibility
 
 > **Version bump:** This repo now requires `together>=2.0.0`. If you are upgrading from v1, see the [migration guide](https://docs.together.ai/docs/v2-migration-guide) for breaking changes in method names, argument styles, and response shapes.

diff --git a/skills/together-batch-inference/references/api-reference.md b/skills/together-batch-inference/references/api-reference.md
@@ -238,14 +238,10 @@ curl -X GET "https://api.together.xyz/v1/batches" \
 
 ## Models with 50% Discount
 
-- `Qwen/Qwen2.5-7B-Instruct-Turbo`
 - `meta-llama/Llama-3.3-70B-Instruct-Turbo`
-- `meta-llama/Llama-3-70b-chat-hf`
-- `mistralai/Mixtral-8x7B-Instruct-v0.1`
-- `zai-org/GLM-4.5-Air-FP8`
-- `openai/whisper-large-v3`
 
-All serverless models support batch processing — models not listed have no discount.
+
+All serverless models support batch processing — models not listed have no discount. The 50% discount does not apply to dedicated endpoint usage.
 
 ## Rate Limits
 

diff --git a/skills/together-chat-completions/references/function-calling-patterns.md b/skills/together-chat-completions/references/function-calling-patterns.md
@@ -705,8 +705,8 @@ const final = await together.chat.completions.create({
 
 ## Supported Models
 
-openai/gpt-oss-120b, openai/gpt-oss-20b, moonshotai/Kimi-K2.5, zai-org/GLM-5, zai-org/GLM-4.5-Air-FP8,
-MiniMaxAI/MiniMax-M2.5, Qwen/Qwen3-Next-80B-A3B-Instruct, Qwen/Qwen3.5-397B-A17B,
-Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3,
-meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-3.3-70B-Instruct-Turbo,
-Qwen/Qwen2.5-7B-Instruct-Turbo, mistralai/Mistral-Small-24B-Instruct-2501
+openai/gpt-oss-120b, openai/gpt-oss-20b, moonshotai/Kimi-K2.6, moonshotai/Kimi-K2.5,
+zai-org/GLM-5.1, zai-org/GLM-5, MiniMaxAI/MiniMax-M2.7, Qwen/Qwen3.5-397B-A17B,
+Qwen/Qwen3.5-9B, Qwen/Qwen3.6-Plus, Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8,
+Qwen/Qwen3-235B-A22B-Instruct-2507-tput, deepseek-ai/DeepSeek-V4-Pro,
+meta-llama/Llama-3.3-70B-Instruct-Turbo, Qwen/Qwen2.5-7B-Instruct-Turbo, google/gemma-4-31B-it
diff --git a/skills/together-chat-completions/references/models.md b/skills/together-chat-completions/references/models.md
@@ -4,59 +4,56 @@
 
 | Use Case | Model | API String | Alternatives |
 |----------|-------|-----------|-------------|
-| Chat (best) | Kimi K2.5 (instant) | `moonshotai/Kimi-K2.5` | `deepseek-ai/DeepSeek-V3.1`, `openai/gpt-oss-120b` |
-| Reasoning | Kimi K2.5 (thinking) | `moonshotai/Kimi-K2.5` | `deepseek-ai/DeepSeek-R1` |
-| Coding Agents | Kimi K2.5 (thinking) | `moonshotai/Kimi-K2.5` | `Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8`, `deepseek-ai/DeepSeek-V3.1` |
-| Small & Fast | GPT-OSS 20B | `openai/gpt-oss-20b` | `Qwen/Qwen2.5-7B-Instruct-Turbo` |
-| Medium General | GPT-OSS 120B | `openai/gpt-oss-120b` | `zai-org/GLM-4.5-Air-FP8` |
-| Function Calling | GLM-5 | `zai-org/GLM-5` | `moonshotai/Kimi-K2.5` |
-| Vision | Kimi K2.5 | `moonshotai/Kimi-K2.5` | `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` |
+| Chat (best) | Kimi K2.6 | `moonshotai/Kimi-K2.6` | `MiniMaxAI/MiniMax-M2.7`, `openai/gpt-oss-120b` |
+| Reasoning | DeepSeek-V4-Pro | `deepseek-ai/DeepSeek-V4-Pro` | `moonshotai/Kimi-K2.6`, `Qwen/Qwen3.6-Plus` |
+| Coding Agents | Qwen3-Coder 480B | `Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8` | `moonshotai/Kimi-K2.6`, `deepseek-ai/DeepSeek-V4-Pro` |
+| Small & Fast | GPT-OSS 20B | `openai/gpt-oss-20b` | `Qwen/Qwen2.5-7B-Instruct-Turbo`, `google/gemma-3n-E4B-it` |
+| Medium General | GPT-OSS 120B | `openai/gpt-oss-120b` | `zai-org/GLM-5` |
+| Function Calling | GLM-5.1 | `zai-org/GLM-5.1` | `moonshotai/Kimi-K2.6`, `Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8` |
+| Vision | Qwen3.5 397B | `Qwen/Qwen3.5-397B-A17B` | `moonshotai/Kimi-K2.5`, `google/gemma-4-31B-it` |
 
 ## Full Chat Model Catalog
 
 | Organization | Model | API String | Context | Quant |
 |-------------|-------|-----------|---------|-------|
-| Moonshot | Kimi K2.5 | `moonshotai/Kimi-K2.5` | 262,144 | INT4 |
-| Qwen | Qwen3.5 397B | `Qwen/Qwen3.5-397B-A17B` | 262,144 | BF16 |
-| Qwen | Qwen3.5 9B | `Qwen/Qwen3.5-9B` | 128,000 | BF16 |
+| MiniMax | MiniMax M2.7 | `MiniMaxAI/MiniMax-M2.7` | 202,752 | FP4 |
+| Qwen | Qwen3.5 397B A17B | `Qwen/Qwen3.5-397B-A17B` | 262,144 | BF16 |
+| Qwen | Qwen3.6 Plus | `Qwen/Qwen3.6-Plus` | 1,000,000 | - |
+| Qwen | Qwen3.5 9B | `Qwen/Qwen3.5-9B` | 262,144 | FP8 |
 | Qwen | Qwen3-Coder 480B | `Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8` | 256,000 | FP8 |
-| Qwen | Qwen3-Coder-Next | `Qwen/Qwen3-Coder-Next-FP8` | 262,144 | FP8 |
 | Qwen | Qwen3 235B Instruct | `Qwen/Qwen3-235B-A22B-Instruct-2507-tput` | 262,144 | FP8 |
-| Qwen | Qwen3-Next 80B Instruct | `Qwen/Qwen3-Next-80B-A3B-Instruct` | 262,144 | BF16 |
-| MiniMax | MiniMax M2.5 | `MiniMaxAI/MiniMax-M2.5` | 228,700 | FP4 |
-| DeepSeek | DeepSeek-V3.1 | `deepseek-ai/DeepSeek-V3.1` | 128,000 | FP8 |
-| DeepSeek | DeepSeek-R1 | `deepseek-ai/DeepSeek-R1` | 163,839 | FP8 |
+| Moonshot | Kimi K2.6 | `moonshotai/Kimi-K2.6` | 262,144 | FP4 |
+| Moonshot | Kimi K2.5 | `moonshotai/Kimi-K2.5` | 262,144 | FP4 |
+| DeepSeek | DeepSeek-V4-Pro | `deepseek-ai/DeepSeek-V4-Pro` | 512,000 | FP4 |
 | OpenAI | GPT-OSS 120B | `openai/gpt-oss-120b` | 128,000 | MXFP4 |
 | OpenAI | GPT-OSS 20B | `openai/gpt-oss-20b` | 128,000 | MXFP4 |
+| Z.ai | GLM-5.1 | `zai-org/GLM-5.1` | 202,752 | FP4 |
 | Z.ai | GLM-5 | `zai-org/GLM-5` | 202,752 | FP4 |
-| Z.ai | GLM 4.7 | `zai-org/GLM-4.7` | 202,752 | FP8 |
-| Z.ai | GLM 4.5 Air | `zai-org/GLM-4.5-Air-FP8` | 131,072 | FP8 |
-| Meta | Llama 4 Maverick | `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` | 1,048,576 | FP8 |
 | Meta | Llama 3.3 70B Turbo | `meta-llama/Llama-3.3-70B-Instruct-Turbo` | 131,072 | FP8 |
-| Deep Cogito | Cogito v2.1 671B | `deepcogito/cogito-v2-1-671b` | 32,768 | FP8 |
-| Mistral | Mistral Small 24B | `mistralai/Mistral-Small-24B-Instruct-2501` | 32,768 | FP16 |
-| Mistral | Mistral 7B v0.2 | `mistralai/Mistral-7B-Instruct-v0.2` | 32,768 | FP16 |
+| Meta | Llama 3 8B Lite | `meta-llama/Meta-Llama-3-8B-Instruct-Lite` | 8,192 | - |
+| Deep Cogito | Cogito v2.1 671B | `deepcogito/cogito-v2-1-671b` | 163,840 | - |
+| Google | Gemma 4 31B IT | `google/gemma-4-31B-it` | 262,144 | FP8 |
 | Google | Gemma 3N E4B | `google/gemma-3n-E4B-it` | 32,768 | FP8 |
+| Liquid AI | LFM2-24B-A2B | `LiquidAI/LFM2-24B-A2B` | 32,768 | - |
 | Qwen | Qwen 2.5 7B Turbo | `Qwen/Qwen2.5-7B-Instruct-Turbo` | 32,768 | FP8 |
 | Essential AI | Rnj-1 Instruct | `essentialai/rnj-1-instruct` | 32,768 | BF16 |
 
 ## Vision Models
 
 | Organization | Model | API String | Context |
 |-------------|-------|-----------|---------|
+| Qwen | Qwen3.5 397B A17B | `Qwen/Qwen3.5-397B-A17B` | 262,144 |
+| Qwen | Qwen3.5 9B | `Qwen/Qwen3.5-9B` | 262,144 |
+| Google | Gemma 4 31B IT | `google/gemma-4-31B-it` | 262,144 |
 | Moonshot | Kimi K2.5 | `moonshotai/Kimi-K2.5` | 262,144 |
-| Meta | Llama 4 Maverick | `meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8` | 524,288 |
-| Qwen | Qwen3-VL-8B | `Qwen/Qwen3-VL-8B-Instruct` | 262,100 |
 
 ## Moderation Models
 
 | Model | API String | Context |
 |-------|-----------|---------|
 | Llama Guard 4 (12B) | `meta-llama/Llama-Guard-4-12B` | 1,048,576 |
-| Virtue Guard | `VirtueAI/VirtueGuard-Text-Lite` | 32,768 |
 
 ## Quantization Types
 - **FP16/BF16:** Full precision
 - **FP8:** 8-bit floating point (Turbo models)
 - **FP4/MXFP4:** 4-bit floating point
-- **INT4:** 4-bit integer (Lite models)