Skip to content

Resolve TEE endpoint and TLS cert from on-chain registry#175

Open
kylexqian wants to merge 7 commits intomainfrom
claude/lucid-khorana
Open

Resolve TEE endpoint and TLS cert from on-chain registry#175
kylexqian wants to merge 7 commits intomainfrom
claude/lucid-khorana

Conversation

@kylexqian
Copy link
Contributor

Instead of blindly trusting the TLS certificate presented by the TEE server (TOFU), the SDK now queries the on-chain TEERegistry contract to discover active LLM proxy endpoints and their verified certificates.

Key changes:

  • Add TEERegistry.abi and tee_registry.py to query the registry contract
  • Replace TOFU cert fetch in llm.py with registry-verified DER cert
  • Client.init queries the registry by default; og_llm_server_url still works as an explicit override (falls back to system CA verification)
  • Add DEFAULT_TEE_REGISTRY_ADDRESS and DEFAULT_TEE_REGISTRY_RPC_URL to defaults.py (OG EVM chain at http://13.59.43.94:8545)
  • Surface tee_id, tee_endpoint, tee_payment_address on every TextGenerationOutput and on the final StreamChunk so callers can audit which enclave served their request
  • Print TEE node info (endpoint, TEE ID, payment address) in all three CLI print helpers (completion, chat, streaming)

kylexqian and others added 7 commits March 5, 2026 01:22
Instead of blindly trusting the TLS certificate presented by the TEE
server (TOFU), the SDK now queries the on-chain TEERegistry contract
to discover active LLM proxy endpoints and their verified certificates.

Key changes:
- Add TEERegistry.abi and tee_registry.py to query the registry contract
- Replace TOFU cert fetch in llm.py with registry-verified DER cert
- Client.init queries the registry by default; og_llm_server_url still
  works as an explicit override (falls back to system CA verification)
- Add DEFAULT_TEE_REGISTRY_ADDRESS and DEFAULT_TEE_REGISTRY_RPC_URL to
  defaults.py (OG EVM chain at http://13.59.43.94:8545)
- Surface tee_id, tee_endpoint, tee_payment_address on every
  TextGenerationOutput and on the final StreamChunk so callers can
  audit which enclave served their request
- Print TEE node info (endpoint, TEE ID, payment address) in all three
  CLI print helpers (completion, chat, streaming)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d with stream=True

The TEE streaming endpoint returns an empty delta ("delta": {}) and no
tool call content in SSE events when tools is supplied — the server-side
streaming path simply does not emit tool call data.

Introduce _tee_llm_chat_tools_as_stream which transparently calls the
non-streaming /v1/chat/completions endpoint and wraps the complete
TextGenerationOutput as a single final StreamChunk (with tool_calls
populated in delta). chat() now routes stream=True + tools to this
method, preserving the streaming iterator interface for callers and
the CLI while returning correct results.

Also removes the temporary [SSE RAW] debug print added during diagnosis,
and fixes from_sse_data to accept "message" as a fallback for "delta"
when the proxy sends a non-streaming format in SSE events.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…EE_LLM references

Two issues caused CI failures:

1. The mock_web3 fixture didn't patch TEERegistry, so Client.__init__ tried
   to instantiate a real TEERegistry (with a live Web3 connection) even in
   unit tests. The mock_abi_files fallback returned {} for TEERegistry.abi
   but web3.eth.contract() requires a list, causing ValueError. Fix: patch
   src.opengradient.client.client.TEERegistry inside mock_web3 and return a
   fake TEEEndpoint with a stub endpoint/tee_id/payment_address.

2. Three LLM tests referenced TEE_LLM.GPT_4O which no longer exists in the
   enum. Updated to TEE_LLM.GPT_5.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ol calls consistently

Two bugs caused tool call results to be invisible in the CLI:

1. print_streaming_chat_result accessed chunk.choices[0] unconditionally.
   Usage-only SSE chunks carry an empty choices list, which caused an
   IndexError that was silently swallowed by the outer except block,
   truncating output before tool calls or finish_reason were printed.
   Fixed by guarding all choices[0] accesses with `if chunk.choices:`.

2. print_llm_chat_result (non-streaming) printed tool_calls as a raw
   Python dict repr. Updated to use the same formatted output as the
   streaming path: "Tool Calls: / Function: ... / Arguments: ...".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gger tool calls

With tool_choice="auto", models like GPT-5 require a system prompt
instructing them to use tools, otherwise they respond with finish_reason
"stop" and empty content. Updated chat-tool and chat-stream-tool to
include a system message matching the pattern that works in the SDK.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…T-5 tool calls

The previous Tokyo/get_weather(location) payload still failed with
finish_reason: stop on GPT-5. Switch chat-tool and chat-stream-tool
to use get_current_weather(city, state, unit) with Dallas, Texas
which consistently triggers tool_calls finish_reason.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the Trust-On-First-Use (TOFU) TLS certificate approach for TEE server connections with a registry-verified certificate pinning model. Instead of blindly fetching and trusting the cert presented by the TEE server at connection time, the SDK now queries an on-chain TEERegistry contract to discover active LLM proxy endpoints and retrieve their verified TLS certificates. It also surfaces TEE node metadata (tee_id, tee_endpoint, tee_payment_address) on all response types and in the CLI for auditability.

Changes:

  • Adds tee_registry.py + TEERegistry.abi to query the on-chain TEE registry for active LLM proxy endpoints and their verified DER TLS certificates
  • Updates Client.__init__ to auto-discover TEE endpoints via the registry by default; og_llm_server_url still works as an explicit override (falls back to system CA)
  • Adds tee_id, tee_endpoint, and tee_payment_address fields to TextGenerationOutput and StreamChunk, and surfaces them in all three CLI print helpers

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/opengradient/client/tee_registry.py New module: queries on-chain registry for TEE endpoints and builds pinned SSL contexts
src/opengradient/abi/TEERegistry.abi New ABI file for the on-chain TEERegistry contract
src/opengradient/client/client.py Client init resolves LLM endpoint+cert from registry or explicit override
src/opengradient/client/llm.py Uses registry-verified cert for TLS; surfaces TEE metadata on responses; tool-call streaming fallback
src/opengradient/types.py Adds tee_id/tee_endpoint/tee_payment_address to StreamChunk and TextGenerationOutput; fixes delta/message SSE fallback
src/opengradient/defaults.py Adds DEFAULT_TEE_REGISTRY_ADDRESS and DEFAULT_TEE_REGISTRY_RPC_URL
src/opengradient/cli.py Prints TEE node info in all three LLM CLI helpers; guards chunk.choices access
tests/client_test.py Mocks TEERegistry in fixtures; updates model enum references to GPT_5
pyproject.toml Bumps version to 0.7.6
Makefile Updates example tool-call commands

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +141 to +149
cert_file = tempfile.NamedTemporaryFile(
prefix="og_tee_tls_", suffix=".pem", delete=False, mode="w"
)
cert_file.write(pem)
cert_file.flush()
cert_file.close()

ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ctx.load_verify_locations(cert_file.name)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build_ssl_context_from_der function creates a temporary file with delete=False and writes the PEM certificate to it. After calling ctx.load_verify_locations(cert_file.name), the file is never deleted. This is called during Client.__init__, meaning every time a new client is created with a registry-verified cert, a leaked .pem file accumulates in the system's temp directory.

To fix this, the temporary file should be deleted after load_verify_locations reads it. You can use os.unlink(cert_file.name) in a try/finally block after ctx.load_verify_locations, or use Python 3.12+'s delete_on_close=False pattern and manage cleanup explicitly. Alternatively, you could write the PEM bytes directly to an io.StringIO and use ssl.SSLContext.load_verify_locations(cadata=pem) to avoid touching the filesystem entirely.

Suggested change
cert_file = tempfile.NamedTemporaryFile(
prefix="og_tee_tls_", suffix=".pem", delete=False, mode="w"
)
cert_file.write(pem)
cert_file.flush()
cert_file.close()
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ctx.load_verify_locations(cert_file.name)
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ctx.load_verify_locations(cadata=pem)

Copilot uses AI. Check for mistakes.
# TEE Registry contract on the OG EVM chain — used to discover LLM proxy endpoints
# and fetch their registry-verified TLS certificates instead of blindly trusting TOFU.
DEFAULT_TEE_REGISTRY_ADDRESS = "0x3d641a2791533b4a0000345ea8d509d01e1ec301"
DEFAULT_TEE_REGISTRY_RPC_URL = "http://13.59.43.94:8545"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default TEE registry RPC URL uses plain HTTP (http://13.59.43.94:8545) rather than HTTPS. The entire security model of this feature relies on trusting the TLS certificate fetched from the on-chain registry — but if the RPC connection itself is unencrypted, a network-level attacker (man-in-the-middle) can intercept the getTEEsByType/getTEE responses and substitute a malicious TEE endpoint and certificate before they reach the client. This completely undermines the registry-verified certificate pinning.

The RPC URL should be served over HTTPS with a valid certificate, or at a minimum the documentation should warn users that using a plain HTTP RPC URL voids the security guarantee.

Suggested change
DEFAULT_TEE_REGISTRY_RPC_URL = "http://13.59.43.94:8545"
DEFAULT_TEE_REGISTRY_RPC_URL = "https://13.59.43.94:8545"

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the DNS from docs for RPC instead of hardcoding IP + no SSL https://docs.opengradient.ai/learn/network/deployment.html#opengradient-testnet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a temporary IP that we're using. My understanding is that the smart contract and TEE registry logic hasn't been deployed to the devnet yet, let me check w/ Khalifa and Aniket

Comment on lines +136 to +138
og_llm_server_url = tee.endpoint
og_llm_streaming_server_url = og_llm_streaming_server_url or tee.endpoint
llm_tls_cert_der = tee.tls_cert_der
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the caller supplies a custom og_llm_streaming_server_url but leaves og_llm_server_url=None (triggering registry lookup), both the non-streaming and streaming TLS contexts are set to the registry-pinned certificate (ssl_ctx is shared, lines 88-90 of llm.py). However, og_llm_streaming_server_url keeps the caller-supplied value (line 137 of client.py), which may point to a different host that does not present the same certificate. This will cause TLS handshake failures when streaming.

The TLS context for the streaming client should use the system CA bundle (i.e., True) when og_llm_streaming_server_url was explicitly provided by the caller, not the registry-pinned cert.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we still have streaming_server_url, doesn't it come from the TEE registry now?

Copy link
Contributor

@adambalogh adambalogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants