docs: draft SPEC-CORE-002 Code Mode Thoughtbox#177
docs: draft SPEC-CORE-002 Code Mode Thoughtbox#177glassBead-tc wants to merge 2 commits intofeature/rlm-samplingfrom
Conversation
Co-authored-by: glassBead-tc <102933849+glassBead-tc@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
Greptile SummaryThis PR introduces Key issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant LLM
participant HostSandbox as Host Sandbox (V8 isolate / Effect Task)
participant tb as tb.* Library
participant Backend as Thoughtbox Backend
participant Storage as Immutable Ledger
participant Projections as Layer D Projections (async)
LLM->>HostSandbox: search({ code: "..." })
HostSandbox-->>LLM: available operations / schema shapes
LLM->>HostSandbox: execute({ code: "tb.claim(...)" })
HostSandbox->>tb: evaluate JS/TS snippet (V8 isolate, 50ms budget)
tb-->>HostSandbox: memory objects (claim, plan, etc.)
HostSandbox->>HostSandbox: lower objects → Canonical IR (Layer B)
HostSandbox->>Backend: POST Canonical IR (TBX-C1 encoded, Layer C)
Backend->>Backend: decode TBX-C1 → Canonical IR
Backend->>Storage: write to immutable ledger
Backend-)Projections: update projections (async, Layer D)
Backend-->>HostSandbox: TBX-C1 columnar results
HostSandbox-->>LLM: decoded results or raw TBX-C1 (Context Isolation)
Prompt To Fix All With AIThis is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 4
Comment:
**Broken dependency reference**
The spec declares a hard dependency on `.specs/agentic-runbooks.md`, but this file does not exist anywhere in the repository. Implementers following this spec will not be able to verify the execution model that Layer A and Phase 5 explicitly rely on (`agentic-runbooks` Effect Workflow, `DurableDeferred`, MCP Task async execution).
This should either be resolved by adding the dependency spec, linking to an external reference, or inlining the relevant execution model contract inline here before this spec is acted on.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 56-59
Comment:
**`k` key is ambiguous across codec contexts**
The TBX-C1 spec assigns `k` as the short key for record-level `kind` (e.g., `"k": "cl"` → `"claim"` via the `d.k` dictionary). However, in the concrete encoding example on lines 135–137, `k` is also used inside `a.relations` items to mean the *relation type*:
```json
"relations": [{ "k": "sup", "to": "th:12" }]
```
Here `"k": "sup"` is expected to expand using the `d.rel` dictionary, not `d.k`. This creates a context-dependent ambiguity: the same key `k` has two different semantic meanings and two different expansion dictionaries depending on where it appears. The spec does not document this context-sensitive expansion rule anywhere, which will confuse codec implementers and could lead to incorrect decoder behavior.
Consider using a distinct key for relation type (e.g., `"t"` for type) to remove the ambiguity, and document the expansion rules explicitly.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 88-113
Comment:
**`CanonicalIR.kind` and `CanonicalIRAttrs.kind` are not type-linked**
The `CanonicalIR` interface has a top-level `kind: string` field alongside `attrs: CanonicalIRAttrs`. Since `CanonicalIRAttrs` is a discriminated union also keyed on `kind`, the two fields can silently diverge (e.g., `kind: "claim"` at the top level but `attrs: { kind: "plan", steps: [] }`). TypeScript will not catch this at compile time as written.
Consider using a generic or mapped type to enforce alignment:
```typescript
interface CanonicalIR<A extends CanonicalIRAttrs = CanonicalIRAttrs> {
id: string;
kind: A["kind"]; // constrained to the attrs discriminant
// ...
attrs: A;
}
```
Or at minimum, add a note that implementations must validate `kind === attrs.kind` at serialization/deserialization boundaries.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 83
Comment:
**`args: any` erases type safety for `tool_call`**
Using `any` for `tool_call.args` defeats the purpose of a schema-enforced `CanonicalIRAttrs` union. Since tool call arguments are arbitrary but structured, `Record<string, unknown>` or `JsonValue` (a recursive safe type) would preserve type safety without over-constraining the schema:
```typescript
| { kind: "tool_call"; toolName: string; args: Record<string, unknown> }
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 33-37
Comment:
**`tb.*` library covers only 4 of 11 Canonical IR record kinds**
The `tb` helpers listed here — `tb.claim()`, `tb.plan()`, `tb.question()`, `tb.evidence()` — account for only 4 of the 11 `CanonicalIRAttrs` record kinds defined in Section 3.1. The remaining 7 kinds (`observation`, `decision`, `revision`, `tool_call`, `tool_result`, `hub_event`, `checkpoint`) have no corresponding authoring-layer builders specified.
It is unclear whether these record kinds are:
- Emitted exclusively by the host (not by LLM-authored snippets), or
- Planned for future `tb.*` extension, or
- Intended to be emitted via a lower-level escape hatch.
This gap should be documented explicitly to avoid ambiguity when implementers build the `tb` library and when LLMs author REPL snippets that need to record, for example, a `revision` or `decision`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 3
Comment:
**Branch name in spec header does not match this PR's branch**
The spec metadata lists `feat/code-mode-mvp` as the target branch, but this PR is open against `feature/rlm-sampling`. If `feat/code-mode-mvp` is the intended implementation branch and `feature/rlm-sampling` is only the drafting context, this should be clarified in a note (e.g., "implementation branch" vs. "drafted in"). Otherwise future readers may be confused about where this spec's implementation lives.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "docs: draft SPEC-COR..." |
| > **Status**: Draft | ||
| > **Created**: 2026-03-13 | ||
| > **Branch**: `feat/code-mode-mvp` | ||
| > **Depends on**: `.specs/agentic-runbooks.md` |
There was a problem hiding this comment.
The spec declares a hard dependency on .specs/agentic-runbooks.md, but this file does not exist anywhere in the repository. Implementers following this spec will not be able to verify the execution model that Layer A and Phase 5 explicitly rely on (agentic-runbooks Effect Workflow, DurableDeferred, MCP Task async execution).
This should either be resolved by adding the dependency spec, linking to an external reference, or inlining the relevant execution model contract inline here before this spec is acted on.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 4
Comment:
**Broken dependency reference**
The spec declares a hard dependency on `.specs/agentic-runbooks.md`, but this file does not exist anywhere in the repository. Implementers following this spec will not be able to verify the execution model that Layer A and Phase 5 explicitly rely on (`agentic-runbooks` Effect Workflow, `DurableDeferred`, MCP Task async execution).
This should either be resolved by adding the dependency spec, linking to an external reference, or inlining the relevant execution model contract inline here before this spec is acted on.
How can I resolve this? If you propose a fix, please make it concise.| * Short keys (`k` for kind, `i` for id, `x` for text). | ||
| * Session-local dictionary blocks to replace repeated strings. | ||
| * Columnar lists (`cols` + `rows`) for query results. | ||
| * Explicit reference arrays (`r`). |
There was a problem hiding this comment.
k key is ambiguous across codec contexts
The TBX-C1 spec assigns k as the short key for record-level kind (e.g., "k": "cl" → "claim" via the d.k dictionary). However, in the concrete encoding example on lines 135–137, k is also used inside a.relations items to mean the relation type:
"relations": [{ "k": "sup", "to": "th:12" }]Here "k": "sup" is expected to expand using the d.rel dictionary, not d.k. This creates a context-dependent ambiguity: the same key k has two different semantic meanings and two different expansion dictionaries depending on where it appears. The spec does not document this context-sensitive expansion rule anywhere, which will confuse codec implementers and could lead to incorrect decoder behavior.
Consider using a distinct key for relation type (e.g., "t" for type) to remove the ambiguity, and document the expansion rules explicitly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 56-59
Comment:
**`k` key is ambiguous across codec contexts**
The TBX-C1 spec assigns `k` as the short key for record-level `kind` (e.g., `"k": "cl"` → `"claim"` via the `d.k` dictionary). However, in the concrete encoding example on lines 135–137, `k` is also used inside `a.relations` items to mean the *relation type*:
```json
"relations": [{ "k": "sup", "to": "th:12" }]
```
Here `"k": "sup"` is expected to expand using the `d.rel` dictionary, not `d.k`. This creates a context-dependent ambiguity: the same key `k` has two different semantic meanings and two different expansion dictionaries depending on where it appears. The spec does not document this context-sensitive expansion rule anywhere, which will confuse codec implementers and could lead to incorrect decoder behavior.
Consider using a distinct key for relation type (e.g., `"t"` for type) to remove the ambiguity, and document the expansion rules explicitly.
How can I resolve this? If you propose a fix, please make it concise.| interface CanonicalIR { | ||
| id: string; // e.g., "th:17" | ||
| kind: string; // e.g., "claim" | ||
| ts: string; // ISO-8601 timestamp | ||
| session: string; | ||
| branch: string; | ||
| agent: string; | ||
| refs: string[]; // Lightweight indexed summary of references for fast traversal | ||
| confidence?: number; | ||
|
|
||
| body: { | ||
| text: string; // Primary human-readable payload | ||
| cipher?: string; // Optional reasoning notation (derived or passed through) | ||
| labels?: string[]; | ||
| metrics?: Record<string, number>; | ||
| }; | ||
|
|
||
| source: { // Provenance metadata | ||
| surface?: string; | ||
| operation?: string; | ||
| transport?: string; | ||
| model?: string; | ||
| }; | ||
|
|
||
| attrs: CanonicalIRAttrs; // Type-specific schema-enforced fields | ||
| } |
There was a problem hiding this comment.
CanonicalIR.kind and CanonicalIRAttrs.kind are not type-linked
The CanonicalIR interface has a top-level kind: string field alongside attrs: CanonicalIRAttrs. Since CanonicalIRAttrs is a discriminated union also keyed on kind, the two fields can silently diverge (e.g., kind: "claim" at the top level but attrs: { kind: "plan", steps: [] }). TypeScript will not catch this at compile time as written.
Consider using a generic or mapped type to enforce alignment:
interface CanonicalIR<A extends CanonicalIRAttrs = CanonicalIRAttrs> {
id: string;
kind: A["kind"]; // constrained to the attrs discriminant
// ...
attrs: A;
}Or at minimum, add a note that implementations must validate kind === attrs.kind at serialization/deserialization boundaries.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 88-113
Comment:
**`CanonicalIR.kind` and `CanonicalIRAttrs.kind` are not type-linked**
The `CanonicalIR` interface has a top-level `kind: string` field alongside `attrs: CanonicalIRAttrs`. Since `CanonicalIRAttrs` is a discriminated union also keyed on `kind`, the two fields can silently diverge (e.g., `kind: "claim"` at the top level but `attrs: { kind: "plan", steps: [] }`). TypeScript will not catch this at compile time as written.
Consider using a generic or mapped type to enforce alignment:
```typescript
interface CanonicalIR<A extends CanonicalIRAttrs = CanonicalIRAttrs> {
id: string;
kind: A["kind"]; // constrained to the attrs discriminant
// ...
attrs: A;
}
```
Or at minimum, add a note that implementations must validate `kind === attrs.kind` at serialization/deserialization boundaries.
How can I resolve this? If you propose a fix, please make it concise.| | { kind: "observation"; context?: string } | ||
| | { kind: "decision"; alternativesConsidered?: string[] } | ||
| | { kind: "revision"; replacesId: string } | ||
| | { kind: "tool_call"; toolName: string; args: any } |
There was a problem hiding this comment.
args: any erases type safety for tool_call
Using any for tool_call.args defeats the purpose of a schema-enforced CanonicalIRAttrs union. Since tool call arguments are arbitrary but structured, Record<string, unknown> or JsonValue (a recursive safe type) would preserve type safety without over-constraining the schema:
| { kind: "tool_call"; toolName: string; args: Record<string, unknown> }Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 83
Comment:
**`args: any` erases type safety for `tool_call`**
Using `any` for `tool_call.args` defeats the purpose of a schema-enforced `CanonicalIRAttrs` union. Since tool call arguments are arbitrary but structured, `Record<string, unknown>` or `JsonValue` (a recursive safe type) would preserve type safety without over-constraining the schema:
```typescript
| { kind: "tool_call"; toolName: string; args: Record<string, unknown> }
```
How can I resolve this? If you propose a fix, please make it concise.| * **Helpers:** Inside `execute()`, a typed `tb` library provides builders injected dynamically into the isolate global scope: | ||
| * `tb.claim(text, { supports: [...] })` | ||
| * `tb.plan(...)` | ||
| * `tb.question(...)` | ||
| * `tb.evidence(...)` |
There was a problem hiding this comment.
tb.* library covers only 4 of 11 Canonical IR record kinds
The tb helpers listed here — tb.claim(), tb.plan(), tb.question(), tb.evidence() — account for only 4 of the 11 CanonicalIRAttrs record kinds defined in Section 3.1. The remaining 7 kinds (observation, decision, revision, tool_call, tool_result, hub_event, checkpoint) have no corresponding authoring-layer builders specified.
It is unclear whether these record kinds are:
- Emitted exclusively by the host (not by LLM-authored snippets), or
- Planned for future
tb.*extension, or - Intended to be emitted via a lower-level escape hatch.
This gap should be documented explicitly to avoid ambiguity when implementers build the tb library and when LLMs author REPL snippets that need to record, for example, a revision or decision.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 33-37
Comment:
**`tb.*` library covers only 4 of 11 Canonical IR record kinds**
The `tb` helpers listed here — `tb.claim()`, `tb.plan()`, `tb.question()`, `tb.evidence()` — account for only 4 of the 11 `CanonicalIRAttrs` record kinds defined in Section 3.1. The remaining 7 kinds (`observation`, `decision`, `revision`, `tool_call`, `tool_result`, `hub_event`, `checkpoint`) have no corresponding authoring-layer builders specified.
It is unclear whether these record kinds are:
- Emitted exclusively by the host (not by LLM-authored snippets), or
- Planned for future `tb.*` extension, or
- Intended to be emitted via a lower-level escape hatch.
This gap should be documented explicitly to avoid ambiguity when implementers build the `tb` library and when LLMs author REPL snippets that need to record, for example, a `revision` or `decision`.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,193 @@ | |||
| > **Status**: Draft | |||
| > **Created**: 2026-03-13 | |||
| > **Branch**: `feat/code-mode-mvp` | |||
There was a problem hiding this comment.
Branch name in spec header does not match this PR's branch
The spec metadata lists feat/code-mode-mvp as the target branch, but this PR is open against feature/rlm-sampling. If feat/code-mode-mvp is the intended implementation branch and feature/rlm-sampling is only the drafting context, this should be clarified in a note (e.g., "implementation branch" vs. "drafted in"). Otherwise future readers may be confused about where this spec's implementation lives.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 3
Comment:
**Branch name in spec header does not match this PR's branch**
The spec metadata lists `feat/code-mode-mvp` as the target branch, but this PR is open against `feature/rlm-sampling`. If `feat/code-mode-mvp` is the intended implementation branch and `feature/rlm-sampling` is only the drafting context, this should be clarified in a note (e.g., "implementation branch" vs. "drafted in"). Otherwise future readers may be confused about where this spec's implementation lives.
How can I resolve this? If you propose a fix, please make it concise.Co-authored-by: glassBead-tc <102933849+glassBead-tc@users.noreply.github.com>
This pull request introduces
SPEC-CORE-002: Code Mode Thoughtbox, proposing the migration of Thoughtbox from an MCP JSON Schema API towards a Code Mode architecture (Layer A) running on an Effect execution engine similar toagentic-runbooks. It introduces the RLM "Structured REPL" concept where snippets can be evaluated asynchronously, emitting Canonical IR (Layer B) that is serialized using the TBX-C1 Codec (Layer C).PR created automatically by Jules for task 6541404214629872030 started by @glassBead-tc