feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests by acere · Pull Request #43 · awslabs/llmeter

acere · 2026-03-25T01:49:52Z

Summary

Adds the OpenAI Responses API endpoint support for LLMeter, with fixes to align with the actual API behavior.

Changes

Endpoint fixes (`llmeter/endpoints/openai_response.py`)

Rename max_tokens to max_output_tokens in create_payload (Response API parameter name)
Fix _parse_response to handle usage=None (Bedrock Mantle doesn't always return it) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens
Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format

Integration tests

Add tests/integ/test_response_endpoint.py — integration tests for ResponseEndpoint and ResponseStreamEndpoint wrappers against Bedrock Mantle
Fix tests/integ/test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens)

Unit test updates

Update all unit test mocks across 5 test files to use spec-based usage mocks (input_tokens/output_tokens) and event-based streaming mocks

Example notebook

Add examples/LLMeter with OpenAI Response API on Bedrock.ipynb demonstrating non-streaming and streaming usage with Runner and plotting

Testing

All 527 unit tests pass
Ruff lint clean

athewsey · 2026-04-06T10:02:21Z

Also almost forgot - we should add the relevant module placeholder .md under docs api reference

… test suite - Add ResponseEndpoint and ResponseStreamEndpoint classes for OpenAI Responses API support - Implement non-streaming and streaming response handling with proper error management - Add structured output support with response format validation and serialization - Create comprehensive unit test suite covering response parsing, error handling, format validation, model parameters, payload parsing, properties, and serialization - Add integration tests for Bedrock response endpoint functionality - Export new response endpoint classes from endpoints module - Update integration test configuration with response endpoint fixtures

- Rename max_tokens to max_output_tokens in create_payload (Response API parameter name) - Fix _parse_response to handle usage=None (Bedrock Mantle) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens - Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format - Fix test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens) - Add integration tests for ResponseEndpoint and ResponseStreamEndpoint - Add example notebook for Response API on Bedrock - Update all unit test mocks to match new behavior

- Rename ResponseEndpoint -> OpenAIResponseEndpoint and ResponseStreamEndpoint -> OpenAIResponseStreamEndpoint for consistency with OpenAICompletionEndpoint naming convention - Change logger.error() to logger.exception() for stack trace consistency with bedrock_invoke.py and litellm.py - Rewrite test_response_bedrock.py to test LLMeter endpoint wrappers instead of raw OpenAI SDK - Update serialization test assertions for new class names - Update example notebook references

- Add docs/reference/endpoints/openai_response.md placeholder - Add openai_response to mkdocs.yml nav under endpoints - Update connect_endpoints user guide to mention Response API endpoints

- Type invoke() payload as CompletionCreateParams / ResponseCreateParams - Type create_payload() return as SDK TypedDicts using cast() - Replace jmespath with plain list comprehension in _parse_payload - Rewrite stream parsers using typed ChatCompletionChunk / event types, removing all hasattr/getattr fallbacks and type: ignore comments - Make OpenAIResponseStreamEndpoint inherit from OpenAIResponseEndpoint, deduplicating _parse_payload and create_payload - Use collections.abc.Sequence instead of typing.Sequence

Previous rename from Response{Stream}Endpoint to OpenAIResponse{Stream}Endpoint had missed some corresponding test class names and mentions in test docstrings.

Claude had originally written separate files for testing 1/ that OpenAI works with Bedrock Mantle endpoints at all, and 2/ that the LLMeter Endpoint worked with this combination. We'd already adjusted the tests in 1/ since we only want to focus on LLMeter-specific aspects, so one of these files was now redundant.

Use OpenAI SDK entities in payload generation and parsing. Clean up typing, including severing responses endpoints from inheriting from ChatCompletions endpoints. Update test stubbing of OpenAI SDK to reflect this separate import pathway.

Tweak OpenAI Responses intro comments on user guide and example notebook for clarity.

athewsey

Looking good now! Tests pass, docs build, and example notebook runs through OK

acere requested a review from athewsey March 25, 2026 01:50

acere self-assigned this Mar 25, 2026

athewsey requested changes Apr 6, 2026

View reviewed changes

Comment thread llmeter/endpoints/__init__.py Outdated

Comment thread llmeter/endpoints/openai_response.py Outdated

Comment thread llmeter/endpoints/openai_response.py Outdated

Comment thread tests/integ/test_response_bedrock.py Outdated

acere and others added 7 commits April 15, 2026 19:16

docs: add API reference for OpenAI Response API endpoints

d7f80df

- Add docs/reference/endpoints/openai_response.md placeholder - Add openai_response to mkdocs.yml nav under endpoints - Update connect_endpoints user guide to mention Response API endpoints

test: Apply OpenAIResponseEndpoint renames

c5087ee

Previous rename from Response{Stream}Endpoint to OpenAIResponse{Stream}Endpoint had missed some corresponding test class names and mentions in test docstrings.

athewsey force-pushed the ResponseAPI branch from 67bca17 to e81dca7 Compare April 15, 2026 11:33

athewsey added 2 commits April 16, 2026 00:36

refactor(openai): Typing improvements

e4b4bd5

Use OpenAI SDK entities in payload generation and parsing. Clean up typing, including severing responses endpoints from inheriting from ChatCompletions endpoints. Update test stubbing of OpenAI SDK to reflect this separate import pathway.

doc(openai): Minor edit notebook & user guide

233ca69

Tweak OpenAI Responses intro comments on user guide and example notebook for clarity.

athewsey approved these changes Apr 15, 2026

View reviewed changes

chore(lint): ruff format

73f5797

athewsey merged commit 1ede455 into awslabs:main Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43
athewsey merged 10 commits intoawslabs:mainfrom
acere:ResponseAPI

acere commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athewsey commented Apr 6, 2026

Uh oh!

athewsey left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acere commented Mar 25, 2026

Summary

Changes

Endpoint fixes (llmeter/endpoints/openai_response.py)

Integration tests

Unit test updates

Example notebook

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athewsey commented Apr 6, 2026

Uh oh!

athewsey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Endpoint fixes (`llmeter/endpoints/openai_response.py`)

athewsey left a comment •

edited

Loading