Skip to content

docs: draft SPEC-CORE-002 Code Mode Thoughtbox#177

Closed
glassBead-tc wants to merge 2 commits intofeature/rlm-samplingfrom
feat/code-mode-mvp-6541404214629872030
Closed

docs: draft SPEC-CORE-002 Code Mode Thoughtbox#177
glassBead-tc wants to merge 2 commits intofeature/rlm-samplingfrom
feat/code-mode-mvp-6541404214629872030

Conversation

@glassBead-tc
Copy link
Copy Markdown
Member

This pull request introduces SPEC-CORE-002: Code Mode Thoughtbox, proposing the migration of Thoughtbox from an MCP JSON Schema API towards a Code Mode architecture (Layer A) running on an Effect execution engine similar to agentic-runbooks. It introduces the RLM "Structured REPL" concept where snippets can be evaluated asynchronously, emitting Canonical IR (Layer B) that is serialized using the TBX-C1 Codec (Layer C).


PR created automatically by Jules for task 6541404214629872030 started by @glassBead-tc

Co-authored-by: glassBead-tc <102933849+glassBead-tc@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@supabase
Copy link
Copy Markdown

supabase Bot commented Mar 18, 2026

This pull request has been ignored for the connected project akjccuoncxlvrrtkvtno because there are no changes detected in supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 18, 2026

Greptile Summary

This PR introduces SPEC-CORE-002, a draft architecture specification proposing the migration of Thoughtbox from a heavy MCP JSON Schema API to a Code Mode architecture with four explicit layers: a JS/TS REPL authoring surface (Layer A), a Canonical IR storage format (Layer B), the TBX-C1 compact wire codec (Layer C), and async read projections (Layer D). The design is well-structured and the layered separation of concerns is sound, but several gaps in the spec need to be addressed before implementation begins.

Key issues found:

  • Missing dependency: .specs/agentic-runbooks.md is declared as a required dependency but does not exist in the repository; the execution model for Layer A and Phase 5 cannot be verified.
  • k key collision in TBX-C1: The short key k is used both for record-level kind (expanded via d.k) and for relation type within a.relations items (expanded via d.rel), creating an undocumented context-sensitive ambiguity that will confuse codec implementers.
  • CanonicalIR type safety gap: The top-level kind: string field and the kind discriminant in attrs: CanonicalIRAttrs are not type-linked, allowing them to silently diverge without compile-time errors.
  • Incomplete tb.* coverage: Only 4 of the 11 defined CanonicalIRAttrs record kinds have corresponding tb.* authoring helpers; the spec does not clarify how the remaining 7 kinds (observation, decision, revision, tool_call, tool_result, hub_event, checkpoint) are emitted from the authoring layer.
  • args: any: The tool_call attrs type uses any for args, undermining the schema-enforced union.
  • Branch name mismatch: The spec header lists feat/code-mode-mvp as the branch, but this PR targets feature/rlm-sampling.

Confidence Score: 3/5

  • Safe to merge as a draft spec, but several specification gaps should be resolved before implementation begins to avoid costly rework.
  • This is a documentation-only PR with no executable code changes, so there is no risk of runtime breakage. However, the spec has multiple ambiguities and gaps (missing dependency, TBX-C1 key collision, incomplete tb.* coverage, unlinked TS types) that, if left unresolved, will directly cause bugs or misaligned implementations downstream. Merging as a draft is acceptable if the team agrees to track these open issues, but the spec should not be treated as implementation-ready in its current state.
  • .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md — particularly Section 3 (data models) and the TBX-C1 encoding rules need clarification before implementation.

Important Files Changed

Filename Overview
.specs/core/SPEC-CORE-002-code-mode-thoughtbox.md New architecture spec for Code Mode Thoughtbox migration; contains several issues: missing dependency file, ambiguous k key in TBX-C1 codec, unlinked kind types in the TypeScript interface, incomplete tb.* library coverage for 7 of 11 record kinds, and a branch name mismatch in the header.

Sequence Diagram

sequenceDiagram
    participant LLM
    participant HostSandbox as Host Sandbox (V8 isolate / Effect Task)
    participant tb as tb.* Library
    participant Backend as Thoughtbox Backend
    participant Storage as Immutable Ledger
    participant Projections as Layer D Projections (async)

    LLM->>HostSandbox: search({ code: "..." })
    HostSandbox-->>LLM: available operations / schema shapes

    LLM->>HostSandbox: execute({ code: "tb.claim(...)" })
    HostSandbox->>tb: evaluate JS/TS snippet (V8 isolate, 50ms budget)
    tb-->>HostSandbox: memory objects (claim, plan, etc.)
    HostSandbox->>HostSandbox: lower objects → Canonical IR (Layer B)
    HostSandbox->>Backend: POST Canonical IR (TBX-C1 encoded, Layer C)
    Backend->>Backend: decode TBX-C1 → Canonical IR
    Backend->>Storage: write to immutable ledger
    Backend-)Projections: update projections (async, Layer D)
    Backend-->>HostSandbox: TBX-C1 columnar results
    HostSandbox-->>LLM: decoded results or raw TBX-C1 (Context Isolation)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 4

Comment:
**Broken dependency reference**

The spec declares a hard dependency on `.specs/agentic-runbooks.md`, but this file does not exist anywhere in the repository. Implementers following this spec will not be able to verify the execution model that Layer A and Phase 5 explicitly rely on (`agentic-runbooks` Effect Workflow, `DurableDeferred`, MCP Task async execution).

This should either be resolved by adding the dependency spec, linking to an external reference, or inlining the relevant execution model contract inline here before this spec is acted on.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 56-59

Comment:
**`k` key is ambiguous across codec contexts**

The TBX-C1 spec assigns `k` as the short key for record-level `kind` (e.g., `"k": "cl"``"claim"` via the `d.k` dictionary). However, in the concrete encoding example on lines 135–137, `k` is also used inside `a.relations` items to mean the *relation type*:

```json
"relations": [{ "k": "sup", "to": "th:12" }]
```

Here `"k": "sup"` is expected to expand using the `d.rel` dictionary, not `d.k`. This creates a context-dependent ambiguity: the same key `k` has two different semantic meanings and two different expansion dictionaries depending on where it appears. The spec does not document this context-sensitive expansion rule anywhere, which will confuse codec implementers and could lead to incorrect decoder behavior.

Consider using a distinct key for relation type (e.g., `"t"` for type) to remove the ambiguity, and document the expansion rules explicitly.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 88-113

Comment:
**`CanonicalIR.kind` and `CanonicalIRAttrs.kind` are not type-linked**

The `CanonicalIR` interface has a top-level `kind: string` field alongside `attrs: CanonicalIRAttrs`. Since `CanonicalIRAttrs` is a discriminated union also keyed on `kind`, the two fields can silently diverge (e.g., `kind: "claim"` at the top level but `attrs: { kind: "plan", steps: [] }`). TypeScript will not catch this at compile time as written.

Consider using a generic or mapped type to enforce alignment:

```typescript
interface CanonicalIR<A extends CanonicalIRAttrs = CanonicalIRAttrs> {
  id: string;
  kind: A["kind"];   // constrained to the attrs discriminant
  // ...
  attrs: A;
}
```

Or at minimum, add a note that implementations must validate `kind === attrs.kind` at serialization/deserialization boundaries.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 83

Comment:
**`args: any` erases type safety for `tool_call`**

Using `any` for `tool_call.args` defeats the purpose of a schema-enforced `CanonicalIRAttrs` union. Since tool call arguments are arbitrary but structured, `Record<string, unknown>` or `JsonValue` (a recursive safe type) would preserve type safety without over-constraining the schema:

```typescript
| { kind: "tool_call"; toolName: string; args: Record<string, unknown> }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 33-37

Comment:
**`tb.*` library covers only 4 of 11 Canonical IR record kinds**

The `tb` helpers listed here — `tb.claim()`, `tb.plan()`, `tb.question()`, `tb.evidence()` — account for only 4 of the 11 `CanonicalIRAttrs` record kinds defined in Section 3.1. The remaining 7 kinds (`observation`, `decision`, `revision`, `tool_call`, `tool_result`, `hub_event`, `checkpoint`) have no corresponding authoring-layer builders specified.

It is unclear whether these record kinds are:
- Emitted exclusively by the host (not by LLM-authored snippets), or
- Planned for future `tb.*` extension, or
- Intended to be emitted via a lower-level escape hatch.

This gap should be documented explicitly to avoid ambiguity when implementers build the `tb` library and when LLMs author REPL snippets that need to record, for example, a `revision` or `decision`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 3

Comment:
**Branch name in spec header does not match this PR's branch**

The spec metadata lists `feat/code-mode-mvp` as the target branch, but this PR is open against `feature/rlm-sampling`. If `feat/code-mode-mvp` is the intended implementation branch and `feature/rlm-sampling` is only the drafting context, this should be clarified in a note (e.g., "implementation branch" vs. "drafted in"). Otherwise future readers may be confused about where this spec's implementation lives.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "docs: draft SPEC-COR..."

> **Status**: Draft
> **Created**: 2026-03-13
> **Branch**: `feat/code-mode-mvp`
> **Depends on**: `.specs/agentic-runbooks.md`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Broken dependency reference

The spec declares a hard dependency on .specs/agentic-runbooks.md, but this file does not exist anywhere in the repository. Implementers following this spec will not be able to verify the execution model that Layer A and Phase 5 explicitly rely on (agentic-runbooks Effect Workflow, DurableDeferred, MCP Task async execution).

This should either be resolved by adding the dependency spec, linking to an external reference, or inlining the relevant execution model contract inline here before this spec is acted on.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 4

Comment:
**Broken dependency reference**

The spec declares a hard dependency on `.specs/agentic-runbooks.md`, but this file does not exist anywhere in the repository. Implementers following this spec will not be able to verify the execution model that Layer A and Phase 5 explicitly rely on (`agentic-runbooks` Effect Workflow, `DurableDeferred`, MCP Task async execution).

This should either be resolved by adding the dependency spec, linking to an external reference, or inlining the relevant execution model contract inline here before this spec is acted on.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +56 to +59
* Short keys (`k` for kind, `i` for id, `x` for text).
* Session-local dictionary blocks to replace repeated strings.
* Columnar lists (`cols` + `rows`) for query results.
* Explicit reference arrays (`r`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 k key is ambiguous across codec contexts

The TBX-C1 spec assigns k as the short key for record-level kind (e.g., "k": "cl""claim" via the d.k dictionary). However, in the concrete encoding example on lines 135–137, k is also used inside a.relations items to mean the relation type:

"relations": [{ "k": "sup", "to": "th:12" }]

Here "k": "sup" is expected to expand using the d.rel dictionary, not d.k. This creates a context-dependent ambiguity: the same key k has two different semantic meanings and two different expansion dictionaries depending on where it appears. The spec does not document this context-sensitive expansion rule anywhere, which will confuse codec implementers and could lead to incorrect decoder behavior.

Consider using a distinct key for relation type (e.g., "t" for type) to remove the ambiguity, and document the expansion rules explicitly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 56-59

Comment:
**`k` key is ambiguous across codec contexts**

The TBX-C1 spec assigns `k` as the short key for record-level `kind` (e.g., `"k": "cl"``"claim"` via the `d.k` dictionary). However, in the concrete encoding example on lines 135–137, `k` is also used inside `a.relations` items to mean the *relation type*:

```json
"relations": [{ "k": "sup", "to": "th:12" }]
```

Here `"k": "sup"` is expected to expand using the `d.rel` dictionary, not `d.k`. This creates a context-dependent ambiguity: the same key `k` has two different semantic meanings and two different expansion dictionaries depending on where it appears. The spec does not document this context-sensitive expansion rule anywhere, which will confuse codec implementers and could lead to incorrect decoder behavior.

Consider using a distinct key for relation type (e.g., `"t"` for type) to remove the ambiguity, and document the expansion rules explicitly.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +88 to +113
interface CanonicalIR {
id: string; // e.g., "th:17"
kind: string; // e.g., "claim"
ts: string; // ISO-8601 timestamp
session: string;
branch: string;
agent: string;
refs: string[]; // Lightweight indexed summary of references for fast traversal
confidence?: number;

body: {
text: string; // Primary human-readable payload
cipher?: string; // Optional reasoning notation (derived or passed through)
labels?: string[];
metrics?: Record<string, number>;
};

source: { // Provenance metadata
surface?: string;
operation?: string;
transport?: string;
model?: string;
};

attrs: CanonicalIRAttrs; // Type-specific schema-enforced fields
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 CanonicalIR.kind and CanonicalIRAttrs.kind are not type-linked

The CanonicalIR interface has a top-level kind: string field alongside attrs: CanonicalIRAttrs. Since CanonicalIRAttrs is a discriminated union also keyed on kind, the two fields can silently diverge (e.g., kind: "claim" at the top level but attrs: { kind: "plan", steps: [] }). TypeScript will not catch this at compile time as written.

Consider using a generic or mapped type to enforce alignment:

interface CanonicalIR<A extends CanonicalIRAttrs = CanonicalIRAttrs> {
  id: string;
  kind: A["kind"];   // constrained to the attrs discriminant
  // ...
  attrs: A;
}

Or at minimum, add a note that implementations must validate kind === attrs.kind at serialization/deserialization boundaries.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 88-113

Comment:
**`CanonicalIR.kind` and `CanonicalIRAttrs.kind` are not type-linked**

The `CanonicalIR` interface has a top-level `kind: string` field alongside `attrs: CanonicalIRAttrs`. Since `CanonicalIRAttrs` is a discriminated union also keyed on `kind`, the two fields can silently diverge (e.g., `kind: "claim"` at the top level but `attrs: { kind: "plan", steps: [] }`). TypeScript will not catch this at compile time as written.

Consider using a generic or mapped type to enforce alignment:

```typescript
interface CanonicalIR<A extends CanonicalIRAttrs = CanonicalIRAttrs> {
  id: string;
  kind: A["kind"];   // constrained to the attrs discriminant
  // ...
  attrs: A;
}
```

Or at minimum, add a note that implementations must validate `kind === attrs.kind` at serialization/deserialization boundaries.

How can I resolve this? If you propose a fix, please make it concise.

| { kind: "observation"; context?: string }
| { kind: "decision"; alternativesConsidered?: string[] }
| { kind: "revision"; replacesId: string }
| { kind: "tool_call"; toolName: string; args: any }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 args: any erases type safety for tool_call

Using any for tool_call.args defeats the purpose of a schema-enforced CanonicalIRAttrs union. Since tool call arguments are arbitrary but structured, Record<string, unknown> or JsonValue (a recursive safe type) would preserve type safety without over-constraining the schema:

| { kind: "tool_call"; toolName: string; args: Record<string, unknown> }
Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 83

Comment:
**`args: any` erases type safety for `tool_call`**

Using `any` for `tool_call.args` defeats the purpose of a schema-enforced `CanonicalIRAttrs` union. Since tool call arguments are arbitrary but structured, `Record<string, unknown>` or `JsonValue` (a recursive safe type) would preserve type safety without over-constraining the schema:

```typescript
| { kind: "tool_call"; toolName: string; args: Record<string, unknown> }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +33 to +37
* **Helpers:** Inside `execute()`, a typed `tb` library provides builders injected dynamically into the isolate global scope:
* `tb.claim(text, { supports: [...] })`
* `tb.plan(...)`
* `tb.question(...)`
* `tb.evidence(...)`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 tb.* library covers only 4 of 11 Canonical IR record kinds

The tb helpers listed here — tb.claim(), tb.plan(), tb.question(), tb.evidence() — account for only 4 of the 11 CanonicalIRAttrs record kinds defined in Section 3.1. The remaining 7 kinds (observation, decision, revision, tool_call, tool_result, hub_event, checkpoint) have no corresponding authoring-layer builders specified.

It is unclear whether these record kinds are:

  • Emitted exclusively by the host (not by LLM-authored snippets), or
  • Planned for future tb.* extension, or
  • Intended to be emitted via a lower-level escape hatch.

This gap should be documented explicitly to avoid ambiguity when implementers build the tb library and when LLMs author REPL snippets that need to record, for example, a revision or decision.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 33-37

Comment:
**`tb.*` library covers only 4 of 11 Canonical IR record kinds**

The `tb` helpers listed here — `tb.claim()`, `tb.plan()`, `tb.question()`, `tb.evidence()` — account for only 4 of the 11 `CanonicalIRAttrs` record kinds defined in Section 3.1. The remaining 7 kinds (`observation`, `decision`, `revision`, `tool_call`, `tool_result`, `hub_event`, `checkpoint`) have no corresponding authoring-layer builders specified.

It is unclear whether these record kinds are:
- Emitted exclusively by the host (not by LLM-authored snippets), or
- Planned for future `tb.*` extension, or
- Intended to be emitted via a lower-level escape hatch.

This gap should be documented explicitly to avoid ambiguity when implementers build the `tb` library and when LLMs author REPL snippets that need to record, for example, a `revision` or `decision`.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,193 @@
> **Status**: Draft
> **Created**: 2026-03-13
> **Branch**: `feat/code-mode-mvp`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Branch name in spec header does not match this PR's branch

The spec metadata lists feat/code-mode-mvp as the target branch, but this PR is open against feature/rlm-sampling. If feat/code-mode-mvp is the intended implementation branch and feature/rlm-sampling is only the drafting context, this should be clarified in a note (e.g., "implementation branch" vs. "drafted in"). Otherwise future readers may be confused about where this spec's implementation lives.

Prompt To Fix With AI
This is a comment left during a code review.
Path: .specs/core/SPEC-CORE-002-code-mode-thoughtbox.md
Line: 3

Comment:
**Branch name in spec header does not match this PR's branch**

The spec metadata lists `feat/code-mode-mvp` as the target branch, but this PR is open against `feature/rlm-sampling`. If `feat/code-mode-mvp` is the intended implementation branch and `feature/rlm-sampling` is only the drafting context, this should be clarified in a note (e.g., "implementation branch" vs. "drafted in"). Otherwise future readers may be confused about where this spec's implementation lives.

How can I resolve this? If you propose a fix, please make it concise.

Co-authored-by: glassBead-tc <102933849+glassBead-tc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant