Resolve TEE endpoint and TLS cert from on-chain registry#175
Resolve TEE endpoint and TLS cert from on-chain registry#175
Conversation
Instead of blindly trusting the TLS certificate presented by the TEE server (TOFU), the SDK now queries the on-chain TEERegistry contract to discover active LLM proxy endpoints and their verified certificates. Key changes: - Add TEERegistry.abi and tee_registry.py to query the registry contract - Replace TOFU cert fetch in llm.py with registry-verified DER cert - Client.init queries the registry by default; og_llm_server_url still works as an explicit override (falls back to system CA verification) - Add DEFAULT_TEE_REGISTRY_ADDRESS and DEFAULT_TEE_REGISTRY_RPC_URL to defaults.py (OG EVM chain at http://13.59.43.94:8545) - Surface tee_id, tee_endpoint, tee_payment_address on every TextGenerationOutput and on the final StreamChunk so callers can audit which enclave served their request - Print TEE node info (endpoint, TEE ID, payment address) in all three CLI print helpers (completion, chat, streaming) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d with stream=True
The TEE streaming endpoint returns an empty delta ("delta": {}) and no
tool call content in SSE events when tools is supplied — the server-side
streaming path simply does not emit tool call data.
Introduce _tee_llm_chat_tools_as_stream which transparently calls the
non-streaming /v1/chat/completions endpoint and wraps the complete
TextGenerationOutput as a single final StreamChunk (with tool_calls
populated in delta). chat() now routes stream=True + tools to this
method, preserving the streaming iterator interface for callers and
the CLI while returning correct results.
Also removes the temporary [SSE RAW] debug print added during diagnosis,
and fixes from_sse_data to accept "message" as a fallback for "delta"
when the proxy sends a non-streaming format in SSE events.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…EE_LLM references
Two issues caused CI failures:
1. The mock_web3 fixture didn't patch TEERegistry, so Client.__init__ tried
to instantiate a real TEERegistry (with a live Web3 connection) even in
unit tests. The mock_abi_files fallback returned {} for TEERegistry.abi
but web3.eth.contract() requires a list, causing ValueError. Fix: patch
src.opengradient.client.client.TEERegistry inside mock_web3 and return a
fake TEEEndpoint with a stub endpoint/tee_id/payment_address.
2. Three LLM tests referenced TEE_LLM.GPT_4O which no longer exists in the
enum. Updated to TEE_LLM.GPT_5.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ol calls consistently Two bugs caused tool call results to be invisible in the CLI: 1. print_streaming_chat_result accessed chunk.choices[0] unconditionally. Usage-only SSE chunks carry an empty choices list, which caused an IndexError that was silently swallowed by the outer except block, truncating output before tool calls or finish_reason were printed. Fixed by guarding all choices[0] accesses with `if chunk.choices:`. 2. print_llm_chat_result (non-streaming) printed tool_calls as a raw Python dict repr. Updated to use the same formatted output as the streaming path: "Tool Calls: / Function: ... / Arguments: ...". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gger tool calls With tool_choice="auto", models like GPT-5 require a system prompt instructing them to use tools, otherwise they respond with finish_reason "stop" and empty content. Updated chat-tool and chat-stream-tool to include a system message matching the pattern that works in the SDK. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…T-5 tool calls The previous Tokyo/get_weather(location) payload still failed with finish_reason: stop on GPT-5. Switch chat-tool and chat-stream-tool to use get_current_weather(city, state, unit) with Dallas, Texas which consistently triggers tool_calls finish_reason. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR replaces the Trust-On-First-Use (TOFU) TLS certificate approach for TEE server connections with a registry-verified certificate pinning model. Instead of blindly fetching and trusting the cert presented by the TEE server at connection time, the SDK now queries an on-chain TEERegistry contract to discover active LLM proxy endpoints and retrieve their verified TLS certificates. It also surfaces TEE node metadata (tee_id, tee_endpoint, tee_payment_address) on all response types and in the CLI for auditability.
Changes:
- Adds
tee_registry.py+TEERegistry.abito query the on-chain TEE registry for active LLM proxy endpoints and their verified DER TLS certificates - Updates
Client.__init__to auto-discover TEE endpoints via the registry by default;og_llm_server_urlstill works as an explicit override (falls back to system CA) - Adds
tee_id,tee_endpoint, andtee_payment_addressfields toTextGenerationOutputandStreamChunk, and surfaces them in all three CLI print helpers
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/opengradient/client/tee_registry.py |
New module: queries on-chain registry for TEE endpoints and builds pinned SSL contexts |
src/opengradient/abi/TEERegistry.abi |
New ABI file for the on-chain TEERegistry contract |
src/opengradient/client/client.py |
Client init resolves LLM endpoint+cert from registry or explicit override |
src/opengradient/client/llm.py |
Uses registry-verified cert for TLS; surfaces TEE metadata on responses; tool-call streaming fallback |
src/opengradient/types.py |
Adds tee_id/tee_endpoint/tee_payment_address to StreamChunk and TextGenerationOutput; fixes delta/message SSE fallback |
src/opengradient/defaults.py |
Adds DEFAULT_TEE_REGISTRY_ADDRESS and DEFAULT_TEE_REGISTRY_RPC_URL |
src/opengradient/cli.py |
Prints TEE node info in all three LLM CLI helpers; guards chunk.choices access |
tests/client_test.py |
Mocks TEERegistry in fixtures; updates model enum references to GPT_5 |
pyproject.toml |
Bumps version to 0.7.6 |
Makefile |
Updates example tool-call commands |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cert_file = tempfile.NamedTemporaryFile( | ||
| prefix="og_tee_tls_", suffix=".pem", delete=False, mode="w" | ||
| ) | ||
| cert_file.write(pem) | ||
| cert_file.flush() | ||
| cert_file.close() | ||
|
|
||
| ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) | ||
| ctx.load_verify_locations(cert_file.name) |
There was a problem hiding this comment.
The build_ssl_context_from_der function creates a temporary file with delete=False and writes the PEM certificate to it. After calling ctx.load_verify_locations(cert_file.name), the file is never deleted. This is called during Client.__init__, meaning every time a new client is created with a registry-verified cert, a leaked .pem file accumulates in the system's temp directory.
To fix this, the temporary file should be deleted after load_verify_locations reads it. You can use os.unlink(cert_file.name) in a try/finally block after ctx.load_verify_locations, or use Python 3.12+'s delete_on_close=False pattern and manage cleanup explicitly. Alternatively, you could write the PEM bytes directly to an io.StringIO and use ssl.SSLContext.load_verify_locations(cadata=pem) to avoid touching the filesystem entirely.
| cert_file = tempfile.NamedTemporaryFile( | |
| prefix="og_tee_tls_", suffix=".pem", delete=False, mode="w" | |
| ) | |
| cert_file.write(pem) | |
| cert_file.flush() | |
| cert_file.close() | |
| ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) | |
| ctx.load_verify_locations(cert_file.name) | |
| ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) | |
| ctx.load_verify_locations(cadata=pem) |
| # TEE Registry contract on the OG EVM chain — used to discover LLM proxy endpoints | ||
| # and fetch their registry-verified TLS certificates instead of blindly trusting TOFU. | ||
| DEFAULT_TEE_REGISTRY_ADDRESS = "0x3d641a2791533b4a0000345ea8d509d01e1ec301" | ||
| DEFAULT_TEE_REGISTRY_RPC_URL = "http://13.59.43.94:8545" |
There was a problem hiding this comment.
The default TEE registry RPC URL uses plain HTTP (http://13.59.43.94:8545) rather than HTTPS. The entire security model of this feature relies on trusting the TLS certificate fetched from the on-chain registry — but if the RPC connection itself is unencrypted, a network-level attacker (man-in-the-middle) can intercept the getTEEsByType/getTEE responses and substitute a malicious TEE endpoint and certificate before they reach the client. This completely undermines the registry-verified certificate pinning.
The RPC URL should be served over HTTPS with a valid certificate, or at a minimum the documentation should warn users that using a plain HTTP RPC URL voids the security guarantee.
| DEFAULT_TEE_REGISTRY_RPC_URL = "http://13.59.43.94:8545" | |
| DEFAULT_TEE_REGISTRY_RPC_URL = "https://13.59.43.94:8545" |
There was a problem hiding this comment.
use the DNS from docs for RPC instead of hardcoding IP + no SSL https://docs.opengradient.ai/learn/network/deployment.html#opengradient-testnet
There was a problem hiding this comment.
I think this is a temporary IP that we're using. My understanding is that the smart contract and TEE registry logic hasn't been deployed to the devnet yet, let me check w/ Khalifa and Aniket
| og_llm_server_url = tee.endpoint | ||
| og_llm_streaming_server_url = og_llm_streaming_server_url or tee.endpoint | ||
| llm_tls_cert_der = tee.tls_cert_der |
There was a problem hiding this comment.
When the caller supplies a custom og_llm_streaming_server_url but leaves og_llm_server_url=None (triggering registry lookup), both the non-streaming and streaming TLS contexts are set to the registry-pinned certificate (ssl_ctx is shared, lines 88-90 of llm.py). However, og_llm_streaming_server_url keeps the caller-supplied value (line 137 of client.py), which may point to a different host that does not present the same certificate. This will cause TLS handshake failures when streaming.
The TLS context for the streaming client should use the system CA bundle (i.e., True) when og_llm_streaming_server_url was explicitly provided by the caller, not the registry-pinned cert.
There was a problem hiding this comment.
why do we still have streaming_server_url, doesn't it come from the TEE registry now?
adambalogh
left a comment
There was a problem hiding this comment.
please address comments
Instead of blindly trusting the TLS certificate presented by the TEE server (TOFU), the SDK now queries the on-chain TEERegistry contract to discover active LLM proxy endpoints and their verified certificates.
Key changes: