Skip to content

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43

Merged
athewsey merged 10 commits intoawslabs:mainfrom
acere:ResponseAPI
Apr 15, 2026
Merged

feat(endpoints): Add OpenAI Responses API endpoint with fixes and integration tests#43
athewsey merged 10 commits intoawslabs:mainfrom
acere:ResponseAPI

Conversation

@acere
Copy link
Copy Markdown
Collaborator

@acere acere commented Mar 25, 2026

Summary

Adds the OpenAI Responses API endpoint support for LLMeter, with fixes to align with the actual API behavior.

Changes

Endpoint fixes (llmeter/endpoints/openai_response.py)

  • Rename max_tokens to max_output_tokens in create_payload (Response API parameter name)
  • Fix _parse_response to handle usage=None (Bedrock Mantle doesn't always return it) and use input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens
  • Rewrite _parse_stream_response to process typed events (response.output_text.delta, response.completed) instead of the old chunk-with-output-array format

Integration tests

  • Add tests/integ/test_response_endpoint.py — integration tests for ResponseEndpoint and ResponseStreamEndpoint wrappers against Bedrock Mantle
  • Fix tests/integ/test_response_bedrock.py to use ResponseUsage attribute names (input_tokens/output_tokens)

Unit test updates

  • Update all unit test mocks across 5 test files to use spec-based usage mocks (input_tokens/output_tokens) and event-based streaming mocks

Example notebook

  • Add examples/LLMeter with OpenAI Response API on Bedrock.ipynb demonstrating non-streaming and streaming usage with Runner and plotting

Testing

  • All 527 unit tests pass
  • Ruff lint clean

@acere acere requested a review from athewsey March 25, 2026 01:50
@acere acere self-assigned this Mar 25, 2026
Comment thread llmeter/endpoints/__init__.py Outdated
Comment thread llmeter/endpoints/openai_response.py Outdated
Comment thread llmeter/endpoints/openai_response.py Outdated
Comment thread tests/integ/test_response_bedrock.py Outdated
@athewsey
Copy link
Copy Markdown
Collaborator

athewsey commented Apr 6, 2026

Also almost forgot - we should add the relevant module placeholder .md under docs api reference

acere and others added 7 commits April 15, 2026 19:16
… test suite

- Add ResponseEndpoint and ResponseStreamEndpoint classes for OpenAI Responses API support
- Implement non-streaming and streaming response handling with proper error management
- Add structured output support with response format validation and serialization
- Create comprehensive unit test suite covering response parsing, error handling, format validation, model parameters, payload parsing, properties, and serialization
- Add integration tests for Bedrock response endpoint functionality
- Export new response endpoint classes from endpoints module
- Update integration test configuration with response endpoint fixtures
- Rename max_tokens to max_output_tokens in create_payload (Response API
  parameter name)
- Fix _parse_response to handle usage=None (Bedrock Mantle) and use
  input_tokens/output_tokens with fallback to prompt_tokens/completion_tokens
- Rewrite _parse_stream_response to process typed events
  (response.output_text.delta, response.completed) instead of the old
  chunk-with-output-array format
- Fix test_response_bedrock.py to use ResponseUsage attribute names
  (input_tokens/output_tokens)
- Add integration tests for ResponseEndpoint and ResponseStreamEndpoint
- Add example notebook for Response API on Bedrock
- Update all unit test mocks to match new behavior
- Rename ResponseEndpoint -> OpenAIResponseEndpoint and
  ResponseStreamEndpoint -> OpenAIResponseStreamEndpoint for
  consistency with OpenAICompletionEndpoint naming convention
- Change logger.error() to logger.exception() for stack trace
  consistency with bedrock_invoke.py and litellm.py
- Rewrite test_response_bedrock.py to test LLMeter endpoint wrappers
  instead of raw OpenAI SDK
- Update serialization test assertions for new class names
- Update example notebook references
- Add docs/reference/endpoints/openai_response.md placeholder
- Add openai_response to mkdocs.yml nav under endpoints
- Update connect_endpoints user guide to mention Response API endpoints
- Type invoke() payload as CompletionCreateParams / ResponseCreateParams
- Type create_payload() return as SDK TypedDicts using cast()
- Replace jmespath with plain list comprehension in _parse_payload
- Rewrite stream parsers using typed ChatCompletionChunk / event types,
  removing all hasattr/getattr fallbacks and type: ignore comments
- Make OpenAIResponseStreamEndpoint inherit from OpenAIResponseEndpoint,
  deduplicating _parse_payload and create_payload
- Use collections.abc.Sequence instead of typing.Sequence
Previous rename from Response{Stream}Endpoint to
OpenAIResponse{Stream}Endpoint had missed some corresponding test
class names and mentions in test docstrings.
Claude had originally written separate files for testing 1/ that
OpenAI works with Bedrock Mantle endpoints at all, and 2/ that the
LLMeter Endpoint worked with this combination. We'd already adjusted
the tests in 1/ since we only want to focus on LLMeter-specific
aspects, so one of these files was now redundant.
Use OpenAI SDK entities in payload generation and parsing. Clean up
typing, including severing responses endpoints from inheriting from
ChatCompletions endpoints. Update test stubbing of OpenAI SDK to
reflect this separate import pathway.
Tweak OpenAI Responses intro comments on user guide and example
notebook for clarity.
Copy link
Copy Markdown
Collaborator

@athewsey athewsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now! Tests pass, docs build, and example notebook runs through OK

@athewsey athewsey merged commit 1ede455 into awslabs:main Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants