Skip to content

feat: add Wafer AI as a first-party OpenAI-compatible provider#28637

Open
ianye23301 wants to merge 1 commit into
BerriAI:litellm_oss_stagingfrom
ianye23301:ian/add-wafer-provider-oss
Open

feat: add Wafer AI as a first-party OpenAI-compatible provider#28637
ianye23301 wants to merge 1 commit into
BerriAI:litellm_oss_stagingfrom
ianye23301:ian/add-wafer-provider-oss

Conversation

@ianye23301
Copy link
Copy Markdown

@ianye23301 ianye23301 commented May 22, 2026

Summary

Adds Wafer AI as an OpenAI-compatible provider following the documented JSON-only path.

Wafer is an OpenAI-compatible inference gateway that serves frontier open models via https://api.wafer.ai/v1 (bearer auth, standard OpenAI chat-completions shape, SSE streaming, tool/function calling). After this PR:

from litellm import completion
import os

os.environ["WAFER_API_KEY"] = "..."
resp = completion(
    model="wafer/GLM-5.1",
    messages=[{"role": "user", "content": "hi"}],
    stream=True,
)

What changed

Four files, ~260 lines:

  1. litellm/llms/openai_like/providers.json — register wafer with base_url=https://api.wafer.ai/v1, api_key_env=WAFER_API_KEY, api_base_env=WAFER_API_BASE, and max_completion_tokens → max_tokens param mapping. No custom transformation needed.

  2. provider_endpoints_support.json — required by the code-quality doc-coverage check for every JSON-registered provider.

  3. model_prices_and_context_window.json + litellm/model_prices_and_context_window_backup.json — 7 chat models with per-token pricing, context windows, and capability flags:

Model Context $/M in $/M out Tools Vision Reasoning
wafer/GLM-5.1 128K 1.50 4.50
wafer/Qwen3.5-397B-A17B 128K 0.60 3.60
wafer/Qwen3.6-35B-A3B 32K 0.19 1.25
wafer/deepseek-v4-flash 128K 0.18 0.35
wafer/deepseek-v4-pro 128K 2.18 4.35
wafer/qwen3.6-max-preview 256K 1.43 8.58
wafer/Kimi-K2.6 262K 1.10 4.80

Cache-read prices included where applicable.

Docs — companion PR at BerriAI/litellm-docs#205 per the AGENTS.md guidance.

Test plan

  • JSONProviderRegistry.get("wafer") returns the expected config (base_url, api_key_env, param_mappings)
  • litellm.get_llm_provider("wafer/GLM-5.1", api_key="...") resolves to (model="GLM-5.1", provider="wafer", base="https://api.wafer.ai/v1")
  • completion(model="wafer/GLM-5.1", ...) hits api.wafer.ai/v1/chat/completions with the bearer header; errors surface as WaferException
  • python3 tests/code_coverage_tests/check_provider_folders_documented.py → 19 openai_like providers, 150 endpoint-support entries, all documented
  • All 7 wafer/<model> entries are in litellm.model_cost with input_cost_per_token > 0 and output_cost_per_token > 0

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@ianye23301
Copy link
Copy Markdown
Author

CI summary after the rerun: 39 pass / 2 fail / 1 cancel.

The only red checks are test-server-root-path (/api/v1) and test-server-root-path (/llmproxy). Both fail at Docker build time, not in any application code:

RUN prisma generate --schema=./schema.prisma
  → npm install prisma@5.4.2
  → node: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory
  → exit code 127

This is an upstream issue with the Docker base image used by the Test Proxy SERVER_ROOT_PATH Routing workflow — libatomic.so.1 isn't present, so node can't run. The same failure is reproducing on every other recent PR I checked (fix/langfuse-closed-client-LIT-3221, fix/fireworks-schema-sanitize-nested-fields, fix/responses-adapter-cache-read-tokens, …), so it's a fleet-wide infra problem, not caused by this PR.

For reference, all Wafer-relevant checks are green: lint, code-quality, unit-test, validate-model-prices-json, llm-provider-tests, openai-anthropic-vertex-bedrock-tests, llm-handler-tests, core-utils, documentation, Analyze (python), Verify PR source branch, etc.

Happy to retrigger once the base image is fixed.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR adds Wafer AI as a first-party OpenAI-compatible provider following the same pattern as the Featherless AI integration. The implementation is thorough and well-tested with 14 mock-only unit tests covering the full registration surface.

  • New WaferConfig(OpenAIGPTConfig) in litellm/llms/wafer/chat/transformation.py handles auth headers, base URL resolution, and max_completion_tokensmax_tokens aliasing; all provider-registry wiring in constants.py, __init__.py, utils.py, get_llm_provider_logic.py, and types/utils.py follows the established pattern.
  • Seven models added to model_prices_and_context_window.json and its backup with pricing, context windows, and capability flags.
  • A hardcoded wafer_models set in constants.py (bare model names without the wafer/ prefix) appears to be dead code — it is never imported or read outside that file; the authoritative set is populated at runtime from the JSON price map in __init__.py.

Confidence Score: 4/5

The change is additive and isolated; existing providers are unaffected.

The integration follows the Featherless AI template closely and all changed wiring paths are well-tested. The only notable gaps are a missing None guard when remapping max_completion_tokens (a null value would be forwarded to the upstream API) and a redundant hardcoded model list in constants.py that will silently drift from the JSON price map. Neither affects existing functionality.

litellm/constants.py (dead wafer_models set) and litellm/llms/wafer/chat/transformation.py (None handling in map_openai_params)

Important Files Changed

Filename Overview
litellm/llms/wafer/chat/transformation.py New WaferConfig extending OpenAIGPTConfig; minor None-handling inconsistency in map_openai_params for max_completion_tokens
litellm/constants.py Adds wafer to provider lists and a hardcoded wafer_models set that duplicates the JSON price map; follows featherless_ai pattern but is dead code
litellm/litellm_core_utils/get_llm_provider_logic.py Adds endpoint-detection and provider-info branch for wafer; correctly respects explicitly passed api_key
litellm/utils.py Adds env-var validation check and ProviderConfigManager mapping for wafer; follows established pattern
litellm/types/utils.py Adds LlmProviders.WAFER enum entry; clean addition
litellm/init.py Wires wafer_models set, add_known_models branch, model_list union, and models_by_provider entry
model_prices_and_context_window.json Adds 7 wafer model entries with pricing, context windows, and capability flags
provider_endpoints_support.json Adds wafer entry with responses:true; consistent with featherless_ai pattern
tests/test_litellm/llms/wafer/chat/test_wafer_chat_transformation.py 14 mocked unit tests covering header injection, missing key, env-var precedence, param mapping, and registry wiring

Reviews (1): Last reviewed commit: "feat: add Wafer AI as a first-party Open..." | Re-trigger Greptile

Comment thread litellm/constants.py Outdated
Comment on lines +1011 to +1022
wafer_models: set = set(
[
"GLM-5.1",
"Qwen3.5-397B-A17B",
"Qwen3.6-35B-A3B",
"deepseek-v4-flash",
"deepseek-v4-pro",
"qwen3.6-max-preview",
"Kimi-K2.6",
]
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unreferenced hardcoded model list

wafer_models defined here (bare model names, no wafer/ prefix) is never imported or read anywhere in the codebase — grep finds no references outside this file. The authoritative model list already lives in model_prices_and_context_window.json, and litellm/__init__.py populates its own wafer_models set from that JSON at startup. This constants.py copy will silently drift whenever models are added or removed from the price map. The same issue exists in featherless_ai_models, but it's worth not propagating the pattern.

Comment on lines +55 to +60
for param, value in non_default_params.items():
if param == "max_completion_tokens":
optional_params["max_tokens"] = value
elif param in supported_openai_params:
if value is not None:
optional_params[param] = value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing None guard on the alias mapping branch

Every other param in the loop is protected by a None check, but the max_completion_tokens alias writes unconditionally to optional_params. Passing a None value for this param therefore forwards a null field to the upstream API, which can produce a validation error. Wrapping the assignment in a None check matches the convention used for all other params in this method.

@ianye23301
Copy link
Copy Markdown
Author

Update after pushing 6a731f3:

CI tally on the new SHA: 39 pass / 2 fail.

The 2 remaining red checks are both unrelated to Wafer:

  1. test-server-root-path (/llmproxy) — Docker build fails at RUN prisma generate --schema=./schema.prisma with node: libatomic.so.1: cannot open shared object file. Same failure on every recent PR in the repo (fix/langfuse-closed-client-LIT-3221, fix/fireworks-schema-sanitize-nested-fields, fix/responses-adapter-cache-read-tokens, …). Base image needs libatomic installed.

  2. misc / Run tests — single test failure in tests/test_litellm/interactions/test_openapi_compliance.py::TestResponseCompliance::test_status_enum_values:

    assert ['in_progress', 'requires_action', 'completed', 'failed', 'cancelled', 'incomplete', 'budget_exceeded']
        == ['in_progress', 'requires_action', 'completed', 'failed', 'cancelled', 'incomplete']
    

    The OpenAPI schema gained budget_exceeded but this test wasn't updated. No Wafer touch.

My-side checks fixed by 6a731f3:

  • All Other Providers / Run tests — the wafer model-cost test now reads the bundled model_prices_and_context_window_backup.json instead of litellm.model_cost (which is fetched from main and lacks this PR's entries until merge).
  • codecov/patch — wafer module at 100% line coverage.
  • ✅ Endpoint-detection branch in get_llm_provider_logic.py now reachable (the previous "https://api.wafer.ai/v1" literal never matched the api.wafer.ai/v1 entry stored in openai_compatible_endpoints; tests now exercise both branches).

Wafer-relevant checks all green: lint, code-quality, unit-test, validate-model-prices-json, llm-provider-tests, openai-anthropic-vertex-bedrock-tests, llm-handler-tests, core-utils, Analyze (python), Verify PR source branch, codecov/patch, documentation, etc.

ianye23301 added a commit to ianye23301/litellm that referenced this pull request May 22, 2026
Two P2 review comments from Greptile on PR BerriAI#28637:

1. **Dead `wafer_models` set in `constants.py`** — the hardcoded set
   was never imported or referenced anywhere; the authoritative
   wafer-model set is populated at startup in ``litellm/__init__.py``
   from ``model_prices_and_context_window.json``. Removing the dead
   copy (which would silently drift from the JSON price map).
   (Note: the Featherless PR I templated from has the same dead code;
    not propagating that mistake here.)

2. **`max_completion_tokens` alias bypassed the `None` guard** in
   ``map_openai_params`` — every other supported param checked
   ``value is not None`` before forwarding, but the alias branch
   wrote unconditionally. A caller passing
   ``max_completion_tokens=None`` would therefore forward
   ``max_tokens: null`` to Wafer's upstream API, which rejects null
   max_tokens with a 400 validation error.

   Fixed by hoisting the ``value is None`` check to the top of the
   loop so it covers both the alias and passthrough branches. Added
   a dedicated test
   ``test_map_openai_params_max_completion_tokens_none_is_dropped``
   that locks down the behavior so a future refactor can't quietly
   regress it.

Verification:
- ``pytest tests/test_litellm/llms/wafer/ --cov=litellm.llms.wafer``
  → 19/19 pass, 100% line coverage on the wafer module
- ``ruff check`` + ``black --check`` clean
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 22, 2026

CLA assistant check
All committers have signed the CLA.

@ianye23301
Copy link
Copy Markdown
Author

Addressed both Greptile P2 comments in 6a8e88d:

  1. Dead wafer_models set in constants.py — removed. The authoritative set is built at startup in litellm/__init__.py from model_prices_and_context_window.json; the constants copy was unreachable and would have silently drifted. (The Featherless template I followed has the same dead block — not propagating it here.)

  2. max_completion_tokens alias bypassed the None guard — fixed by hoisting if value is None: continue to the top of map_openai_params's loop so the alias and passthrough branches both honor it. Added a test test_map_openai_params_max_completion_tokens_none_is_dropped to lock the behavior down.

Local verification: 19/19 pass on tests/test_litellm/llms/wafer/, 100% line coverage on the wafer module, ruff check + black --check clean.


Re the still-red test-server-root-path / misc checks — confirmed both are upstream and unrelated to this PR:

  • test-server-root-path (/api/v1) and (/llmproxy): the libatomic.so.1: cannot open shared object file failure during prisma generate was repo-wide as of ~18:37 UTC, but I see runs against litellm_oss_staging started succeeding around 18:39 UTC (e.g. litellm_redis-circuit-breaker-fix, litellm_mc_purview_guardrails, litellm_shin_staging_05_22_2026). The new push should retrigger and pick that up.
  • misc / Run tests: failing on tests/test_litellm/interactions/test_openapi_compliance.py::TestResponseCompliance::test_status_enum_values — the OpenAPI schema gained a budget_exceeded enum but the test wasn't updated. Nothing Wafer-related touches that file.

I'll post the new CI tally once the rerun settles.

@ianye23301
Copy link
Copy Markdown
Author

Final CI tally on SHA 6a8e88d: 39 pass / 2 fail / 1 cancel.

All Wafer-relevant checks pass: unit-test, lint, code-quality, validate-model-prices-json, documentation, Verify PR source branch, llm-provider-tests, openai-anthropic-vertex-bedrock-tests, llm-handler-tests, core-utils, Analyze (python), codecov/patch, All Other Providers / Run tests, and everything else.

The 2 still-red checks are both genuinely unrelated to this PR — I dug into both:

1. test-server-root-path (/api/v1) and (/llmproxy) — Dockerfile base-image issue

The build fails at RUN prisma generate --schema=./schema.prisma:

node: error while loading shared libraries: libatomic.so.1:
       cannot open shared object file: No such file or directory
subprocess.CalledProcessError: Command '['/root/.cache/prisma-python/nodeenv/bin/npm', 'install', 'prisma@5.4.2']'
       returned non-zero exit status 127.

docker/Dockerfile.non_root uses cgr.dev/chainguard/wolfi-base and installs nodejs via apk add but doesn't pull in libatomic. Prisma 5.4.2's bundled nodeenv binary links against it.

I confirmed by surveying ~30 recent runs of the Test Proxy SERVER_ROOT_PATH Routing workflow across the repo — failures and successes alternate based on whether the run pulls a warm prisma generate layer from the GHA build cache (type=gha). My PR's cache shard does not have a warm layer, so the build re-runs and hits the libatomic error. This is not something this PR can or should fix (would be out of scope for adding a provider). A one-liner adding libatomic to the apk install block in docker/Dockerfile.non_root would resolve it.

2. misc / Run tests — unrelated schema-drift test

tests/test_litellm/interactions/test_openapi_compliance.py::TestResponseCompliance::test_status_enum_values
  AssertionError:
    ['in_progress', 'requires_action', 'completed', 'failed', 'cancelled', 'incomplete', 'budget_exceeded']
    ==
    ['in_progress', 'requires_action', 'completed', 'failed', 'cancelled', 'incomplete']

The OpenAPI schema gained a budget_exceeded status value; this test wasn't updated. Touches no Wafer code paths. Failing fleet-wide.

Happy to retrigger CI once either is patched on litellm_oss_staging. Otherwise this PR is ready for review whenever maintainers are.

Per the maintainer feedback on BerriAI#28637 and the docs at
https://docs.litellm.ai/docs/contributing/adding_openai_compatible_providers,
this replaces the previous 11-file first-party wiring with the
documented JSON-only approach.

Four files changed:

- ``litellm/llms/openai_like/providers.json`` — register ``wafer``
  with ``base_url=https://api.wafer.ai/v1``, ``api_key_env=WAFER_API_KEY``,
  ``api_base_env=WAFER_API_BASE``, and ``max_completion_tokens`` →
  ``max_tokens`` param mapping. Wafer is an OpenAI-compatible
  inference gateway; no custom transformation required.

- ``provider_endpoints_support.json`` — required by the code-quality
  doc-coverage check for every entry in ``openai_like/providers.json``.

- ``model_prices_and_context_window.json`` (+ bundled backup) — pricing
  and capability metadata for the 7 current Wafer chat models: GLM-5.1,
  Qwen3.5-397B-A17B, Qwen3.6-35B-A3B, deepseek-v4-flash, deepseek-v4-pro,
  qwen3.6-max-preview, Kimi-K2.6. Per-token input / output / cache-read
  costs and capability flags (tools, vision, reasoning where applicable).

Verified locally:
- ``JSONProviderRegistry.get("wafer")`` returns the expected config.
- ``litellm.get_llm_provider("wafer/GLM-5.1", api_key="...")``
  resolves to ``provider=wafer, base=https://api.wafer.ai/v1``.
- ``completion(model="wafer/GLM-5.1", ...)`` hits
  ``api.wafer.ai/v1/chat/completions`` with the bearer header and
  surfaces upstream errors as ``WaferException``.
- ``tests/code_coverage_tests/check_provider_folders_documented.py``
  passes (19 openai_like providers, 150 entries in
  provider_endpoints_support.json).
- All 7 ``wafer/<model>`` entries are in ``litellm.model_cost`` with
  ``input_cost_per_token > 0`` and ``output_cost_per_token > 0``.
@ianye23301 ianye23301 force-pushed the ian/add-wafer-provider-oss branch from b52dc1b to 67b9d91 Compare May 26, 2026 19:09
@ianye23301
Copy link
Copy Markdown
Author

Thanks for the pointer — fully reworked.

Force-pushed 67b9d91 which replaces the 11-file first-party wiring with the documented JSON-only path: a single entry in litellm/llms/openai_like/providers.json, plus the JSON keepers (provider_endpoints_support.json for the doc-coverage check and 7 pricing entries in model_prices_and_context_window.json + backup).

Net change: 4 files, 260 insertions (was 11 files, 680). No litellm/llms/wafer/ module, no enum/constants/lazy-imports/utils/init touches, no custom transformation, no Python tests — param_mappings.max_completion_tokens → max_tokens is declared inline in the JSON.

Locally verified:

  • JSONProviderRegistry.get("wafer") returns the right config
  • get_llm_provider("wafer/GLM-5.1") → provider=wafer, base=https://api.wafer.ai/v1
  • completion(model="wafer/GLM-5.1", …) actually hits api.wafer.ai/v1 (real call, returns a WaferException against the production endpoint)
  • tests/code_coverage_tests/check_provider_folders_documented.py passes (19 openai_like / 150 endpoint entries)
  • All 7 pricing entries load and have non-zero per-token costs

PR description updated to match. Ready for re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants