From 9ed45d373cf360eed8981a8c354d490cbe7c0518 Mon Sep 17 00:00:00 2001
From: Abhijeet Prasad <abhijeet@braintrustdata.com>
Date: Tue, 17 Mar 2026 17:12:16 -0400
Subject: [PATCH] feat: Add spec for prompts

---
 docs/research/prompts-api-research.md | 663 ++++++++++++++++++++++++++
 docs/telemetry/prompts-api.md         | 625 ++++++++++++++++++++++++
 2 files changed, 1288 insertions(+)
 create mode 100644 docs/research/prompts-api-research.md
 create mode 100644 docs/telemetry/prompts-api.md

diff --git a/docs/research/prompts-api-research.md b/docs/research/prompts-api-research.md
new file mode 100644
index 0000000..3981bc0
--- /dev/null
+++ b/docs/research/prompts-api-research.md
@@ -0,0 +1,663 @@
+# Prompts API Research
+
+This document is research for a future SDK spec for Braintrust prompts. It is not the spec itself.
+
+The target workflows are:
+
+1. Defining prompts via SDK APIs
+2. Loading prompts from Braintrust via SDK APIs
+3. Invoking prompts in deployed applications
+
+The intended shape for the eventual spec should be similar to [braintrust-spec PR #2](https://github.com/braintrustdata/braintrust-spec/pull/2): overview, public API, behavior, wire format, and examples.
+
+## Primary References
+
+- Spec template: https://github.com/braintrustdata/braintrust-spec/pull/2
+- User docs: https://www.braintrust.dev/docs/evaluate/write-prompts#sdk
+- Deploy docs: https://www.braintrust.dev/docs/deploy/prompts
+
+Local code and docs used for this research:
+
+- `braintrust/docs/evaluate/write-prompts.mdx`
+- `braintrust/docs/deploy/prompts.mdx`
+- `braintrust/docs/openapi.yaml`
+- `braintrust/tests/bt_services/functions.test.ts`
+- `braintrust/tests/bt_services/test_prompt_environment_integration.py`
+- `braintrust-sdk-javascript/js/src/framework2.ts`
+- `braintrust-sdk-javascript/js/src/prompt-schemas.ts`
+- `braintrust-sdk-javascript/js/src/logger.ts`
+- `braintrust-sdk-javascript/js/src/functions/invoke.ts`
+- `braintrust-sdk-python/py/src/braintrust/framework2.py`
+- `braintrust-sdk-python/py/src/braintrust/logger.py`
+- `braintrust-sdk-python/py/src/braintrust/functions/invoke.py`
+- `braintrust-sdk-ruby/lib/braintrust/prompt.rb`
+- `braintrust-sdk-ruby/lib/braintrust/api/functions.rb`
+- `braintrust-sdk-java/src/main/java/dev/braintrust/prompt/BraintrustPrompt.java`
+- `braintrust-sdk-java/src/main/java/dev/braintrust/prompt/BraintrustPromptLoader.java`
+- `braintrust-sdk-java/src/main/java/dev/braintrust/api/BraintrustApiClient.java`
+
+## Versions Reviewed
+
+This research was based on the following local SDK/backend checkouts:
+
+| Repo | Package / SDK version | Revision reviewed |
+|---|---|---|
+| `braintrust-sdk-javascript` | `3.4.0` from `js/package.json` | `js-sdk-v3.3.0-59-g704597f2-dirty` (`704597f2`) |
+| `braintrust-sdk-python` | `0.9.0` from `py/src/braintrust/version.py` | `py-sdk-v0.9.0-9-gf4b70dd4` (`f4b70dd4`) |
+| `braintrust-sdk-ruby` | `0.2.1` from `lib/braintrust/version.rb` | `v0.2.1-1-g74b588b` (`74b588b`) |
+| `braintrust-sdk-java` | git-derived version from `build.gradle` | `v0.2.9-5-gb06777a` (`b06777a`) |
+| `braintrust` backend/docs repo | workspace repo checkout | `v1.1.31-1149-g1d2aac55b9-dirty` (`1d2aac55b9`) |
+
+Notes:
+
+- The Java SDK computes its published version from git metadata at build time rather than storing a fixed version string in `gradle.properties`.
+- The JavaScript checkout was dirty when reviewed, so its exact local state may have included uncommitted changes beyond commit `704597f2`.
+
+## Executive Summary
+
+The current Braintrust prompt surface is split across three related but not identical concepts:
+
+1. Prompt objects stored in the control plane (`/v1/prompt`)
+2. Prompt definitions published as functions with `function_data.type = "prompt"`
+3. Prompt execution through the function invocation plane (`/function/invoke` or `/v1/function/{id}/invoke`)
+
+The SDKs do not currently expose the same prompt feature set:
+
+- JavaScript is the most complete implementation.
+- Python covers the same broad workflows but has notable parity gaps.
+- Ruby has a useful load/build implementation, but not the same high-level authoring or invoke surface.
+- Java has a minimal load/render surface and generic function invocation by ID, but not a prompt-centric API.
+
+There are also real docs/code mismatches today, especially around deployed prompt invocation.
+
+## Canonical Backend Model
+
+### Prompt object
+
+The control plane exposes prompt CRUD on `/v1/prompt` and `/v1/prompt/{prompt_id}`.
+
+Key backend schemas from `braintrust/docs/openapi.yaml`:
+
+- `CreatePrompt`
+- `PatchPrompt`
+- `Prompt`
+- `PromptData`
+- `PromptBlockData`
+
+Relevant fields on `PromptData`:
+
+- `prompt`
+- `options`
+- `parser`
+- `tool_functions`
+- `template_format`
+- `mcp`
+
+`PromptBlockData` is a tagged union:
+
+- chat prompt: `{ type: "chat", messages, tools? }`
+- completion prompt: `{ type: "completion", content }`
+
+### Prompt invocation
+
+Prompt execution is not a dedicated `/v1/prompt/.../invoke` API. It goes through function invocation.
+
+There are two relevant backend invocation paths:
+
+- Proxy path used by JS/Python SDKs: `/function/invoke`
+- REST path used by Java/Ruby low-level clients: `/v1/function/{function_id}/invoke`
+
+The backend `FunctionId` union supports these function identifiers:
+
+- `function_id`
+- `project_name + slug`
+- `global_function`
+- `prompt_session_id + prompt_session_function_id`
+- `inline_code`
+- `inline_function`
+- `inline_prompt`
+
+The backend `InvokeApi` request supports at least:
+
+- `input`
+- `expected`
+- `metadata`
+- `tags`
+- `messages`
+- `parent`
+- `stream`
+- `mode`
+- `strict`
+- `mcp_auth`
+- `overrides`
+- `version`
+
+Important implication: prompt execution is already modeled as a special case of function execution.
+
+## Workflow 1: Defining Prompts Via SDK APIs
+
+### JavaScript
+
+Current authoring API is `project.prompts.create(...)` in `framework2.ts`.
+
+Supported inputs:
+
+- `name`
+- `slug`
+- `description`
+- `id`
+- exactly one of `prompt` or `messages`
+- `model`
+- `params`
+- `tools`
+- `ifExists`
+- `metadata`
+- `tags`
+- `templateFormat`
+- `environments`
+- `noTrace` on the returned local prompt handle
+
+Behavior:
+
+- Raw tool JSON is serialized into `prompt_data.prompt.tools`
+- Tool/function references are serialized into `prompt_data.tool_functions`
+- `templateFormat` is persisted as top-level `prompt_data.template_format`
+- `environments` are emitted as top-level function definition environment assignments
+- Published function definition uses `function_data: { type: "prompt" }`
+
+Observations:
+
+- JS is the only SDK in this research set that already threads `environments` through prompt authoring.
+- JS is also the only SDK here that clearly threads `templateFormat` through prompt authoring.
+
+### Python
+
+Current authoring API is `project.prompts.create(...)` in `framework2.py`.
+
+Supported inputs:
+
+- `name`
+- `slug`
+- `description`
+- `id`
+- exactly one of `prompt` or `messages`
+- `model`
+- `params`
+- `tools`
+- `if_exists`
+- `metadata`
+- `tags`
+
+Behavior:
+
+- Raw tools are serialized into `prompt_data.prompt.tools`
+- Tool/function references are serialized into `prompt_data.tool_functions`
+- Published definition uses `function_data: { type: "prompt" }`
+
+Notable gaps relative to JS:
+
+- No `templateFormat` authoring surface
+- No `environments` authoring surface
+
+### Ruby
+
+There is no `project.prompts.create(...)` style high-level SDK builder in the repo.
+
+The closest current surface is low-level function creation through `API::Functions#create(...)`, typically with:
+
+- `function_data: { type: "prompt" }`
+- `prompt_data: {...}`
+
+This is workable, but it is not the same SDK ergonomics as JS/Python.
+
+### Java
+
+There is no prompt authoring builder in the Java SDK repo.
+
+Java currently has:
+
+- prompt loading
+- prompt rendering helpers
+- generic function invocation by function ID
+
+but not code-based prompt definition/publishing.
+
+### Research Conclusion For Authoring
+
+The eventual spec will need to choose between:
+
+- a minimal common denominator based on current cross-language support, or
+- a normative target where other SDKs converge toward the JS surface
+
+If the goal is a real cross-language prompt SDK spec, the JS authoring surface is the strongest candidate for the normative model.
+
+## Workflow 2: Loading Prompts From Braintrust Via SDK APIs
+
+### JavaScript
+
+Current entrypoint: `loadPrompt(...)` in `js/src/logger.ts`.
+
+Supported selectors:
+
+- `id`
+- `projectName + slug`
+- `projectId + slug`
+
+Supported modifiers:
+
+- `version`
+- `environment`
+- `defaults`
+- `noTrace`
+
+Behavior:
+
+- `version` and `environment` are mutually exclusive
+- loading by `id` uses `/v1/prompt/{id}`
+- loading by project/slug uses `/v1/prompt`
+- successful fetches are cached
+- cache fallback is skipped when `version` or `environment` is set
+
+Returned object:
+
+- `Prompt`
+
+Prompt capabilities:
+
+- `id`, `projectId`, `name`, `slug`, `version`, `options`, `templateFormat`
+- `build(...)`
+- `buildWithAttachments(...)`
+
+Build behavior:
+
+- supports both chat and completion prompts
+- defaults merge with prompt params and model
+- injects `span_info.metadata.prompt`
+- resolves `template_format`
+- supports `strict`
+- supports appending extra `messages`
+- deduplicates an extra system message if the saved prompt already has one
+- parses rendered `tools` JSON into typed tool objects
+
+JS is the most complete prompt loading/build implementation in the repos reviewed.
+
+### Python
+
+Current public entrypoint:
+
+- `load_prompt(...)`
+
+Supported selectors:
+
+- `id`
+- `project + slug`
+- `project_id + slug`
+
+Supported modifiers:
+
+- `version`
+- `environment`
+- `defaults`
+- `no_trace`
+
+Behavior:
+
+- rejects `version + environment`
+- loading by ID uses `/v1/prompt/{id}`
+- loading by project/slug uses `/v1/prompt`
+- successful fetches are cached
+- cache fallback is skipped when `version` or `environment` is set
+
+Returned object:
+
+- `Prompt`
+
+Prompt capabilities:
+
+- `id`, `name`, `slug`, `version`, `options`
+- `from_prompt_data(...)`
+- `build(**kwargs)`
+
+Build behavior:
+
+- always uses Mustache rendering
+- no explicit support for `template_format`
+- no attachment-hydration equivalent to JS `buildWithAttachments`
+- no `messages` append/merge surface on `build`
+- `strict` is passed as a keyword inside `build_args`, not as a separate method parameter
+
+Important doc/code mismatch:
+
+- The docs show Python examples like `prompt.build({"name": "Alice"}, strict=True)`, but the current method signature is `build(self, **build_args)`, so the real implementation is keyword-argument oriented.
+
+### Ruby
+
+Current entrypoint: `Braintrust::Prompt.load(project:, slug:, version: nil, defaults: {}, api: nil)`
+
+Selector support:
+
+- `project + slug`
+
+Modifier support:
+
+- `version`
+- `defaults`
+
+Behavior:
+
+- resolves prompt by listing functions, then fetching a single function
+- uses `/v1/function` and `/v1/function/{id}`, not `/v1/prompt`
+
+Returned object:
+
+- `Braintrust::Prompt`
+
+Prompt capabilities:
+
+- `id`, `name`, `slug`, `project_id`
+- `prompt`, `messages`, `tools`, `model`, `options`, `template_format`
+- `build(...)`
+
+Build behavior:
+
+- supports explicit hash or keyword args
+- supports `strict`
+- supports template formats:
+  - `mustache`
+  - `none`
+  - `nunjucks` is recognized but intentionally unsupported and raises
+- merges params into top-level output
+- parses tools JSON
+
+Ruby is a decent prompt load/build implementation, but it is not aligned with the JS/Python prompt loading path or selector set.
+
+### Java
+
+Current entrypoint: `BraintrustPromptLoader.load(...)`
+
+Selector support:
+
+- prompt slug
+- optional project name
+- optional version
+
+Modifier support:
+
+- defaults
+
+Behavior:
+
+- loads through `BraintrustApiClient.getPrompt(projectName, slug, version)`
+- current API client uses `/v1/prompt?project_name=...&slug=...&version=...`
+
+Returned object:
+
+- `BraintrustPrompt`
+
+Prompt capabilities:
+
+- `renderMessages(parameters)`
+- `getOptions()`
+
+Build-like behavior:
+
+- rendering and option merging are separate operations
+- Mustache only
+- no strict mode
+- no `template_format`
+- no load by ID
+- no environment selection
+- no cache layer
+
+Java currently provides a partial prompt loading story rather than a full prompt object API.
+
+### Research Conclusion For Loading
+
+A future spec will need to normalize at least:
+
+- selector inputs: `id`, `projectName/project`, `projectId`, `slug`
+- modifiers: `version`, `environment`, `defaults`, `noTrace`
+- build behavior
+- cache semantics
+- template-format semantics
+
+JS is the clearest normative target. Python is close but not fully aligned. Ruby and Java are materially narrower.
+
+## Workflow 3: Invoking Prompts
+
+### Backend capability
+
+The backend invocation plane is broader than the SDK public surfaces.
+
+Backend already supports prompt invocation by:
+
+- `function_id`
+- `project_name + slug`
+- `prompt_session_id + prompt_session_function_id`
+- `inline_prompt`
+
+and also supports:
+
+- `version`
+- `messages`
+- `strict`
+- `stream`
+- `mode`
+- `overrides`
+- `mcp_auth`
+
+### JavaScript
+
+Current public entrypoint: `invoke(...)` in `js/src/functions/invoke.ts`.
+
+Supported identifiers:
+
+- `function_id`
+- `projectName + slug`
+- `globalFunction`
+- `promptSessionId + promptSessionFunctionId`
+
+Supported execution args:
+
+- `input`
+- `messages`
+- `metadata`
+- `tags`
+- `parent`
+- `stream`
+- `mode`
+- `strict`
+- `version`
+- `projectId` via `x-bt-project-id`
+
+Notably absent from the public JS invoke surface:
+
+- `environment`
+- `inline_prompt`
+- `overrides`
+- `mcp_auth`
+
+### Python
+
+Current public entrypoint: `invoke(...)` in `py/src/braintrust/functions/invoke.py`.
+
+Supported identifiers:
+
+- `function_id`
+- `project_name + slug`
+- `global_function`
+- `prompt_session_id + prompt_session_function_id`
+
+Supported execution args:
+
+- `input`
+- `messages`
+- `metadata`
+- `tags`
+- `parent`
+- `stream`
+- `mode`
+- `strict`
+- `version`
+- `project_id` via `x-bt-project-id`
+
+Notably absent from the public Python invoke surface:
+
+- `environment`
+- `inline_prompt`
+- `overrides`
+- `mcp_auth`
+
+### Ruby
+
+Current low-level invoke surface is `API::Functions#invoke(id:, input:)`.
+
+There is no prompt-oriented top-level `invoke(project:, slug:, ...)` implementation in the repo matching the current deploy docs.
+
+Higher-level Ruby helpers exist for:
+
+- remote task wrappers
+- remote scorer wrappers
+
+but not a JS/Python-style prompt invoke API.
+
+### Java
+
+Current low-level invoke surface is `BraintrustApiClient.invokeFunction(functionId, request)`, which calls `/v1/function/{function_id}/invoke`.
+
+There is no prompt-specific `invokePrompt(projectName, slug, ...)` helper in the Java SDK.
+
+### Docs/Code Mismatches Around Invocation
+
+The docs currently describe a broader prompt invocation surface than the code reviewed here.
+
+Examples:
+
+- `braintrust/docs/deploy/prompts.mdx` documents `invoke(..., environment=...)`
+- current JS and Python `invoke(...)` implementations do not expose `environment`
+- `braintrust/docs/deploy/prompts.mdx` shows Ruby `Braintrust.invoke(...)`
+- the Ruby SDK repo reviewed here does not appear to implement that top-level invoke surface
+- `braintrust/docs/evaluate/write-prompts.mdx` shows `logger.invoke("summarizer", ...)`
+- the JS and Python SDK repos reviewed here do not expose a corresponding logger method in the main logger implementations
+
+These look like either:
+
+- docs ahead of SDK implementation, or
+- parallel code paths not present in the repos reviewed
+
+For a spec, these mismatches must be resolved explicitly.
+
+## Cross-Language Matrix
+
+| Capability | JS | Python | Ruby | Java |
+|---|---|---|---|---|
+| High-level `project.prompts.create(...)` | Yes | Yes | No | No |
+| Prompt authoring `templateFormat` | Yes | No | N/A | N/A |
+| Prompt authoring `environments` | Yes | No | No | No |
+| Load by prompt ID | Yes | Yes | No | No |
+| Load by project + slug | Yes | Yes | Yes | Yes |
+| Load by environment | Yes | Yes | No | No |
+| Cache loaded prompts | Yes | Yes | No obvious cache | No obvious cache |
+| Prompt object `build()` | Yes | Yes | Yes | No single build method |
+| `template_format = none` on build | Yes | No | Yes | No |
+| `template_format = nunjucks` | Yes, addon-based | No | Recognized but unsupported | No |
+| Attachment-aware prompt build | Yes | No | No | No |
+| Public prompt invoke by project + slug | Yes | Yes | No | No |
+| Public prompt invoke by ID | Yes | Yes | Low-level only | Low-level only |
+| Public invoke `environment` | No | No | No | No |
+| Public invoke `inline_prompt` | No | No | No | No |
+
+## Important Gaps To Decide In The Spec
+
+### 1. What is the normative invoke surface?
+
+The backend already supports:
+
+- slug-based prompt invocation
+- ID-based invocation
+- prompt-session invocation
+- inline prompt invocation
+
+The public SDKs only expose a subset of that.
+
+The spec should decide whether inline prompt invocation is in scope for the prompt spec or intentionally out of scope.
+
+### 2. Should `environment` be part of prompt invocation?
+
+The docs say yes. The current JS/Python invoke implementations say no.
+
+Because prompt loading already supports `environment`, a spec should decide whether invocation must support it directly or whether callers are expected to:
+
+1. `loadPrompt(environment=...)`
+2. then execute separately
+
+That second model is not how the docs currently describe deployed prompts.
+
+### 3. What is the canonical prompt-loading backend path?
+
+Current implementations differ:
+
+- JS/Python: `/v1/prompt`
+- Ruby: `/v1/function`
+- Java: `/v1/prompt`
+
+The spec should likely normalize on prompt objects rather than generic function lookup for prompt loading, while still allowing equivalent implementations if behavior matches.
+
+### 4. What is the canonical prompt object shape and build contract?
+
+Current implementations differ on:
+
+- strict mode
+- template formats
+- extra message merging
+- tool parsing
+- attachment hydration
+- whether build is one method or split across `renderMessages()` and `getOptions()`
+
+This is one of the most important parts of the eventual spec.
+
+### 5. What is the minimum cross-language authoring API?
+
+Current JS authoring is richer than Python, and Ruby/Java do not match at all.
+
+If the spec is meant to drive convergence, it should probably standardize:
+
+- prompt contents: `prompt` xor `messages`
+- `model`
+- `params`
+- `tools`
+- `metadata`
+- `tags`
+- `templateFormat`
+- `environments`
+
+## Suggested Direction For The Future Spec
+
+The eventual PR2-style spec should probably:
+
+1. Treat prompt authoring, prompt loading/building, and prompt invocation as three separate public API sections.
+2. Use the backend prompt data model as the wire-format source of truth.
+3. Make JS the behavioral reference point where current SDKs differ, unless backend behavior or docs clearly imply a different target.
+4. Call out doc/code gaps as explicit decisions rather than silently inheriting one side.
+5. Distinguish required cross-language semantics from optional language-specific ergonomics.
+
+## Concrete Findings To Carry Into Spec Drafting
+
+- Prompt storage and prompt execution are different backend surfaces and should be spec'd separately.
+- Prompt execution is function execution, so prompt invoke APIs should either wrap or mirror function invocation semantics.
+- `environment` is already first-class in prompt loading, and likely needs a clear spec position for invocation too.
+- Prompt authoring currently has the most complete shape in JavaScript.
+- Prompt loading/building currently has the most complete shape in JavaScript.
+- Python is close enough that a convergence spec is realistic, but it has several concrete parity gaps.
+- Ruby and Java currently look more like partial implementations than full prompt SDKs.
+
+## Proposed Next Step
+
+Use this research to draft a separate spec doc with these top-level sections:
+
+- Overview
+- Public API
+- Behavior
+- Wire Format
+- Examples
+- Open Questions / Non-goals
diff --git a/docs/telemetry/prompts-api.md b/docs/telemetry/prompts-api.md
new file mode 100644
index 0000000..1bb9ea0
--- /dev/null
+++ b/docs/telemetry/prompts-api.md
@@ -0,0 +1,625 @@
+# Prompts API
+
+This document specifies a cross-language SDK API for Braintrust prompts.
+
+It is informed by the research in `docs/research/prompts-api-research.md`, but this document is normative where the research note is descriptive.
+
+## Overview
+
+The Prompts API covers three distinct workflows:
+
+1. Defining and publishing prompts via SDK APIs
+2. Loading prompts from Braintrust and building them locally
+3. Invoking prompts in deployed applications
+
+The canonical public surface is a top-level `prompts` namespace.
+
+Prompt storage and prompt execution are separate backend concerns:
+
+- Prompt storage uses the prompt control plane.
+- Prompt execution uses the function invocation plane.
+
+The public SDK API MUST preserve that distinction even if an implementation shares lower-level helpers internally.
+
+## Public API
+
+### Top-level namespace
+
+The canonical public entrypoint is:
+
+```ts
+prompts.create(...)
+prompts.load(...)
+prompts.invoke(...)
+```
+
+Languages MAY provide additional ergonomic aliases, but those aliases are not the normative API.
+
+Examples of allowed aliases:
+
+- `projects.create(...).prompts.create(...)`
+- `loadPrompt(...)` as an alias for `prompts.load(...)`
+- `prompt.invoke(...)` on a loaded prompt object
+
+### Selectors
+
+A prompt is identified by exactly one of these selectors:
+
+```ts
+type PromptSelector =
+  | { id: string }
+  | { project: string; slug: string }
+  | { projectId: string; slug: string };
+```
+
+Version selection and environment selection are modifiers on a selector:
+
+```ts
+type PromptResolution = {
+  version?: string;
+  environment?: string;
+};
+```
+
+`version` and `environment` are mutually exclusive.
+
+### Authoring
+
+In this spec, `prompts.create(...)` creates a local draft prompt. It does not perform network I/O.
+
+```ts
+type PromptContent =
+  | {
+      type: "chat";
+      messages: MessageTemplate[];
+    }
+  | {
+      type: "completion";
+      content: string;
+    };
+
+type PromptTool = SavedToolRef | RawToolDefinition;
+
+type PromptCreateArgs = {
+  project?: string;
+  projectId?: string;
+  id?: string;
+  name: string;
+  slug?: string;
+  description?: string;
+  content: PromptContent;
+  model: string;
+  params?: ModelParams;
+  tools?: PromptTool[];
+  parser?: ParserSpec;
+  templateFormat?: "mustache" | "nunjucks" | "none";
+  metadata?: Record<string, unknown>;
+  tags?: string[];
+  environments?: string[];
+  noTrace?: boolean;
+};
+
+interface DraftPrompt {
+  readonly project?: string;
+  readonly projectId?: string;
+  readonly id?: string;
+  readonly name: string;
+  readonly slug: string;
+  readonly definition: PromptCreateArgs;
+
+  build(
+    input: Record<string, unknown>,
+    options?: PromptBuildOptions,
+  ): BuiltPrompt;
+
+  publish(options?: PromptPublishOptions): Promise<ResolvedPrompt>;
+}
+
+interface PromptPublishOptions {
+  ifExists?: "error" | "replace" | "ignore";
+}
+```
+
+`project` or `projectId` MUST be provided if the draft will be published.
+
+SDKs MAY provide language-specific sugar that accepts `prompt` xor `messages` instead of the canonical `content` tagged union, but the canonical API shape is `content`.
+
+### Loading
+
+`prompts.load(...)` resolves a stored prompt and returns a resolved prompt object.
+
+```ts
+type PromptLoadOptions = PromptSelector &
+  PromptResolution & {
+    defaults?: Record<string, unknown>;
+    noTrace?: boolean;
+    state?: unknown;
+  };
+
+interface ResolvedPrompt {
+  readonly id: string;
+  readonly projectId?: string;
+  readonly name: string;
+  readonly slug: string;
+  readonly version: string;
+  readonly templateFormat: "mustache" | "nunjucks" | "none" | null;
+  readonly promptData: PromptData;
+
+  build(
+    input: Record<string, unknown>,
+    options?: PromptBuildOptions,
+  ): BuiltPrompt;
+
+  invoke(
+    args: PromptInvokeArgs,
+  ): Promise<PromptInvokeResult | PromptInvokeStream>;
+}
+```
+
+### Building
+
+```ts
+type PromptBuildOptions = {
+  flavor?: "chat" | "completion";
+  strict?: boolean;
+  messages?: Message[];
+  templateFormat?: "mustache" | "nunjucks" | "none";
+};
+```
+
+`build(...)` is local compilation. It renders templates, merges defaults and stored parameters, and produces a provider-ready request payload.
+
+Attachment-aware build helpers are OPTIONAL language-specific ergonomics. The required cross-language API is `build(...)`.
+
+### Invocation
+
+`prompts.invoke(...)` is the canonical one-shot execution API.
+
+```ts
+type PromptInvokeArgs = {
+  input: Record<string, unknown>;
+  messages?: Message[];
+  metadata?: Record<string, unknown>;
+  tags?: string[];
+  parent?: ExportedParent;
+  stream?: boolean;
+  mode?: "auto" | "parallel";
+  strict?: boolean;
+  state?: unknown;
+};
+
+declare const prompts: {
+  create(args: PromptCreateArgs): DraftPrompt;
+  load(args: PromptLoadOptions): Promise<ResolvedPrompt>;
+  invoke(
+    selector: PromptSelector & PromptResolution,
+    args: PromptInvokeArgs,
+  ): Promise<PromptInvokeResult | PromptInvokeStream>;
+};
+```
+
+`prompts.invoke(...)` MUST accept the same selector forms as `prompts.load(...)`.
+
+The resolved prompt object MAY also expose `prompt.invoke(...)` as an ergonomic instance method. That method is equivalent to calling `prompts.invoke(...)` with the prompt's own concrete identity and version.
+
+## Behavior
+
+### 1. Namespace and import shape
+
+`prompts` is the canonical API namespace. In JavaScript, SDKs SHOULD expose it from the package entrypoint so it can be used as:
+
+```ts
+import { prompts } from "braintrust";
+```
+
+or:
+
+```ts
+import * as braintrust from "braintrust";
+
+braintrust.prompts.load(...);
+```
+
+Client-scoped APIs such as `bt.prompts` are allowed, but they are not required by this spec.
+
+### 2. Draft prompts vs resolved prompts
+
+Draft prompts and resolved prompts are different objects with different guarantees.
+
+- A draft prompt is local authoring state.
+- A resolved prompt is a stored prompt with a concrete identity and version.
+
+This is a normative API decision in this spec, not a claim about every current SDK surface.
+
+Current SDKs are mixed:
+
+- JavaScript already behaves like a local builder plus separate publish step.
+- Python appears close to the same model from the research notes.
+- Ruby and Java do not currently expose the same high-level prompt authoring API.
+
+Draft prompts MAY support `build(...)`.
+
+Draft prompts MUST NOT require backend support for inline prompt execution in order to conform to this spec.
+
+### 3. Publishing
+
+`draft.publish(...)` converts a draft prompt into a stored prompt that can later be loaded or invoked by selector.
+
+Publishing MUST preserve:
+
+- `name`
+- `slug`
+- `description`
+- prompt content
+- `model`
+- `params`
+- `parser`
+- raw tools
+- referenced tool functions
+- `templateFormat`
+- `metadata`
+- `tags`
+- `environments`
+
+After successful publish, the returned prompt object MUST include a concrete `id` and `version`.
+
+An implementation MAY persist prompts through a prompt CRUD endpoint, through function-definition publishing with `function_data.type = "prompt"`, or through another equivalent backend path, as long as observable behavior matches this spec.
+
+#### CLI push integration
+
+CLI push workflows are compatible with this model.
+
+In a CLI workflow such as:
+
+```bash
+npx braintrust push summarizer.ts
+```
+
+the SDK or CLI MAY evaluate the module in a discovery mode, collect draft prompts created by `prompts.create(...)`, and perform the network upload later as part of the push command.
+
+In that workflow:
+
+- `prompts.create(...)` still does not perform network I/O
+- the push command is responsible for publishing discovered drafts
+- the observable result MUST be equivalent to calling `draft.publish(...)` on each discovered draft
+
+This allows a language SDK to support both explicit runtime publishing and file-based declaration discovery for CLI workflows.
+
+### 4. Resolution
+
+Resolution rules are:
+
+1. If `id` is provided, it takes precedence over project and slug fields.
+2. Otherwise exactly one of `project + slug` or `projectId + slug` MUST be provided.
+3. `version` and `environment` MUST NOT be provided together.
+4. If neither `version` nor `environment` is provided, the latest published version is resolved.
+
+`environment` is part of the normative prompt API for both loading and invocation.
+
+### 5. Build semantics
+
+`build(...)` MUST be deterministic with respect to:
+
+- the prompt definition
+- the resolved prompt version, if applicable
+- the provided input
+- the provided defaults
+- the provided build options
+
+Build parameter precedence is:
+
+1. `defaults`
+2. stored prompt params
+3. stored prompt model
+
+Template format resolution precedence is:
+
+1. `build(..., { templateFormat })`
+2. stored prompt `templateFormat`
+3. `"mustache"`
+
+`strict` applies to template rendering and parameter rendering.
+
+`build(...)` MUST support both chat and completion prompts.
+
+For chat prompts:
+
+- runtime `messages` are appended after stored prompt messages
+- runtime `messages` MUST NOT introduce an additional system message if the stored prompt already contains a system message
+
+For completion prompts:
+
+- runtime `messages` are invalid
+
+If a prompt references attachments and the SDK cannot render them in `build(...)`, the SDK MUST fail with a clear error unless it provides an explicit attachment-aware build helper.
+
+### 6. Trace metadata
+
+By default, `build(...)` SHOULD include prompt trace metadata in the built request payload when the language SDK supports tracing.
+
+That metadata MUST identify the prompt artifact being built, including:
+
+- prompt id, if known
+- project id, if known
+- version, if known
+- rendered variables
+
+If `noTrace` is set on the prompt handle, trace prompt metadata MUST be omitted.
+
+### 7. Invocation semantics
+
+`prompts.invoke(...)` is equivalent to:
+
+1. resolve the prompt selector using the provided `version` or `environment`
+2. execute that resolved prompt through the function invocation plane
+
+The public prompt API MUST remain prompt-centric even if the transport is implemented as generic function invocation.
+
+Invocation MUST support:
+
+- `input`
+- `messages`
+- `metadata`
+- `tags`
+- `parent`
+- `stream`
+- `mode`
+- `strict`
+- `version`
+- `environment`
+
+If `stream` is false or omitted, `prompts.invoke(...)` returns a final result.
+
+If `stream` is true, `prompts.invoke(...)` returns a stream object.
+
+The prompt API does not require callers to manually `load(...)` before `invoke(...)`.
+
+### 8. Caching
+
+SDKs MAY cache successfully loaded prompts.
+
+If an SDK caches prompt loads:
+
+- the cache key MUST include the selector and the concrete resolved version
+- `defaults` and `noTrace` MUST NOT change prompt identity
+- stale cache fallback MUST NOT be used when `version` or `environment` is specified
+
+The spec does not require a cache for prompt invocation.
+
+### 9. Errors
+
+SDKs MUST fail with clear errors for:
+
+- invalid selector combinations
+- `version` plus `environment`
+- missing required selector fields
+- prompt not found
+- multiple prompts resolved for a supposedly unique selector
+- `messages` used with a completion prompt
+- unsupported template formats
+- attempts to build attachments without an attachment-aware helper, if required
+
+## Wire Format
+
+### Prompt object
+
+The wire-format source of truth for prompt data is the backend prompt data model.
+
+Canonical prompt data fields are:
+
+- `prompt`
+- `options`
+- `parser`
+- `tool_functions`
+- `template_format`
+- `mcp`
+
+The prompt content block is a tagged union:
+
+```json
+{ "type": "chat", "messages": [...], "tools": "..." }
+```
+
+or:
+
+```json
+{ "type": "completion", "content": "..." }
+```
+
+The canonical mapping from public API to wire format is:
+
+- `content.type = "chat"` -> `prompt_data.prompt = { type: "chat", messages, tools? }`
+- `content.type = "completion"` -> `prompt_data.prompt = { type: "completion", content }`
+- `model` and `params` -> `prompt_data.options`
+- raw tool definitions -> `prompt_data.prompt.tools`
+- saved tool references -> `prompt_data.tool_functions`
+- `templateFormat` -> `prompt_data.template_format`
+- `parser` -> `prompt_data.parser`
+
+### Stored prompt transport
+
+The preferred storage surface is the prompt control plane:
+
+- `GET /v1/prompt`
+- `GET /v1/prompt/{prompt_id}`
+- corresponding create/update endpoints
+
+Implementations MAY use another equivalent backend transport if the resulting prompt object behavior matches this spec.
+
+### Published function representation
+
+When a prompt is published as a function definition, the canonical representation is:
+
+```json
+{
+  "project_id": "...",
+  "name": "...",
+  "slug": "...",
+  "description": "...",
+  "function_data": { "type": "prompt" },
+  "prompt_data": { "...": "..." },
+  "if_exists": "replace",
+  "tags": ["..."],
+  "metadata": { "...": "..." },
+  "environments": [{ "slug": "production" }]
+}
+```
+
+That representation is valid so long as it preserves the same public behavior as prompt CRUD.
+
+### Invocation transport
+
+There is no dedicated prompt invoke endpoint in this spec.
+
+Prompt execution is transported through the function invocation plane. Supported backend transports include:
+
+- `POST /function/invoke`
+- `POST /v1/function/{function_id}/invoke`
+
+The canonical invocation request shape includes:
+
+- prompt selector fields
+- `version` or `environment`
+- `input`
+- `messages`
+- `metadata`
+- `tags`
+- `parent`
+- `stream`
+- `mode`
+- `strict`
+
+## Examples
+
+### Define and publish a prompt
+
+```ts
+import { prompts } from "braintrust";
+
+const draft = prompts.create({
+  project: "support",
+  name: "Summarizer",
+  slug: "summarizer",
+  description: "Summarize a support conversation",
+  content: {
+    type: "chat",
+    messages: [
+      {
+        role: "system",
+        content: "You are a concise support assistant.",
+      },
+      {
+        role: "user",
+        content: "Summarize this conversation: {{conversation}}",
+      },
+    ],
+  },
+  model: "gpt-5",
+  params: {
+    temperature: 0.2,
+  },
+  templateFormat: "mustache",
+  environments: ["production"],
+});
+
+const prompt = await draft.publish({ ifExists: "replace" });
+```
+
+### Load and build a prompt
+
+```ts
+import { prompts } from "braintrust";
+
+const prompt = await prompts.load({
+  project: "support",
+  slug: "summarizer",
+  environment: "production",
+  defaults: {
+    locale: "en-US",
+  },
+});
+
+const built = prompt.build(
+  {
+    conversation: "Customer cannot log in.",
+  },
+  {
+    strict: true,
+  },
+);
+```
+
+### Invoke a prompt directly
+
+```ts
+import { prompts } from "braintrust";
+
+const result = await prompts.invoke(
+  {
+    project: "support",
+    slug: "summarizer",
+    environment: "production",
+  },
+  {
+    input: {
+      conversation: "Customer cannot log in.",
+    },
+    strict: true,
+  },
+);
+```
+
+### Compatibility aliases
+
+These are equivalent, if a language SDK chooses to provide the aliases:
+
+```ts
+prompts.load({ project: "support", slug: "summarizer" });
+```
+
+```ts
+loadPrompt({ projectName: "support", slug: "summarizer" });
+```
+
+and:
+
+```ts
+prompts.create({
+  project: "support",
+  name: "Summarizer",
+  content: { type: "completion", content: "Summarize: {{text}}" },
+  model: "gpt-5",
+});
+```
+
+```ts
+projects.create({ name: "support" }).prompts.create({
+  name: "Summarizer",
+  prompt: "Summarize: {{text}}",
+  model: "gpt-5",
+});
+```
+
+and:
+draft.publish({ ifExists: "replace" });
+```
+
+## Open Questions And Non-goals
+
+### Non-goals
+
+The following are out of scope for Prompts API v1:
+
+- inline prompt invocation as a public prompt API
+- prompt-session-scoped function identifiers
+- arbitrary invoke-time prompt overrides
+- language-specific local code-function publishing semantics
+
+Those capabilities MAY exist in lower-level function APIs without being part of the prompt API.
+
+### Open questions
+
+These questions remain open for a future revision:
+
+- whether prompt CRUD should be specified as first-class public methods such as `prompts.get`, `prompts.update`, and `prompts.delete`, or whether `create` plus `publish` plus `load` is sufficient for SDK v1
+- whether attachment-aware build should eventually become required rather than optional