Skip to content

Run the operations API on the main thread#355

Open
kriszyp wants to merge 189 commits intomainfrom
operations-api-main-thread
Open

Run the operations API on the main thread#355
kriszyp wants to merge 189 commits intomainfrom
operations-api-main-thread

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented Apr 15, 2026

No description provided.

@kriszyp kriszyp marked this pull request as ready for review April 23, 2026 15:32
@kriszyp kriszyp requested a review from a team as a code owner April 23, 2026 15:32
Copy link
Copy Markdown
Member

@cb1kenobi cb1kenobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a bit over my head, so I did my best.

Comment thread components/componentLoader.ts
Comment thread resources/Table.ts Outdated
Co-authored-by: Chris Barber <chris@harperdb.io>
@@ -48,7 +47,6 @@ describe('18. Computed indexed properties', () => {
assert.equal(r.body[0].taxRate, 0.19, r.text);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is jsTotalPrice dropped here?

jsTotalPrice is a JS-function-based computed property — it's set via setComputedAttribute('jsTotalPrice', fn) in resources.js, not via @computed(from: "...") in the GraphQL schema. That means it has no computedFromExpression to persist in metadata and reconstruct from.

The new createComputedFrom fallback in Table.ts only fires when attribute.computedFromExpression is present, so jsTotalPrice won't be recomputed on threads that didn't load the component's resources.js (e.g. the main thread).

However, this search goes through the REST endpoint, which hits workers that do load resources.js. So the assertion should still hold — unless the REST response no longer includes JS-only computed properties for some other reason introduced by this PR.

Could you clarify:

  1. Is this a known limitation (JS-function computed properties don't survive cross-thread table reconstruction), and if so, should it be documented?
  2. Or is the assertion removed because the REST path changed how it serializes computed properties?

Silently dropping this assertion could mask a correctness regression for users relying on setComputedAttribute with indexed computed properties.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

Review: PR #355 — Run the operations API on the main thread

Traced through all changed files. The architectural goal is sound: moving the operations API server onto the main thread (rather than a worker) so it doesn't compete for worker slots, and adding graceful server closure to the main-thread shutdown path.

What I verified

Surface Result
operationsServer.ts: startstartOnMainThread rename Consistent with the existing convention in graphql.ts, roles.ts, harper_logger.js. componentLoader.ts already dispatches startOnMainThread before start (line 449) — no consumers break.
componentLoader.ts: operationsApi moved to conditional require TRUSTED_RESOURCE_PLUGINS[name] = ... pattern pre-exists (line 112), so TypeScript handles the mutation. Workers no longer load operationsServer, which correctly matches the main-thread-only intent.
auth.ts: startOnMainThread = start Operations API needs auth; adding this alias so it fires on the main thread is correct.
threadServer.js: closeServers extracted and called from restart.js Correct — main-thread servers must be closed before cleanupChildrenProcesses. The refactor also de-duplicates the close loop that was inlined in the worker shutdown handler.
listenOnPorts idempotency guard (let listening module-level) The guard prevents double-binding when both the main thread (via socketRouter.ts) and workers call listenOnPorts. Safe because (a) each thread has its own module context, and (b) the guard only fires within a single context. Latent nit: after closeServers(), if listenOnPorts() were called again in-process, it would return old resolved Promises instead of re-binding. Not exercised today (process exits), but worth knowing.
Unix-socket path: getWorkerIndex() == 0 guard removed This guarded the socket from being created by multiple workers. The operations API socket now lives on the main thread (only caller without a worker index). Workers don't register that server in their SERVERS map, so no conflict.
components/operations.js: deployComponent guard change Old: if (isMainThread) return. New: load attempt only runs on !isMainThread && !SAFE_MODE. Main thread now falls through to replication/restart logic — which is the intended behavior.
Table.ts: new createComputedFrom path Mirrors the pattern in graphql.ts exactly. Fires only when attribute.computedFromExpression is set (i.e., @computed(from: "...") was present). Expressions originate from schema metadata, so runInThisContext() is not a new injection surface.
graphql.ts: property.computedFromExpression = computedFromExpression Persists the expression so Table.ts can reconstruct the compute function on threads that didn't load the GraphQL schema. Straightforward.

1. jsTotalPrice assertion silently dropped

File: integrationTests/apiTests/tests/18_computedIndexedProperties.mjs:47 (inline comment posted)

What: assert.equal(r.body[0].jsTotalPrice, 119, r.text) removed without explanation.

Why it matters: jsTotalPrice is set via setComputedAttribute('jsTotalPrice', fn) in resources.js — it has no @computed(from: "...") expression. That means no computedFromExpression is persisted, so the new createComputedFrom fallback in Table.ts never fires for it. Workers that load resources.js still have the function, but the assertion is gone even though the search hits a worker (REST endpoint). This could be:

  • A known limitation (JS-only computed properties don't survive cross-thread reconstruction) that should be documented, or
  • A correctness regression the test removal is hiding.

Suggested fix: If the property genuinely doesn't return via REST anymore, add a comment explaining why and consider documenting the limitation. If it should still return, keep the assertion.


No other blockers found.

heskew and others added 17 commits May 5, 2026 05:50
- Only replace entry point when new node level is strictly higher, not equal,
  to avoid replacing well-connected entry points with unconnected new nodes
- Use passed distance function for entry point in searchLayer instead of
  always using the instance default, which mixed distance metrics
- Prevent orphaning nodes at level 0 when pruning excess connections
- Update stale distances in reverse connections when a node's vector changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nathan Heskew <heskew@pm.me>
When a higher version number is passed in spawn options, the existing
process is killed and a new one is started. The version is stored
alongside the PID in the pid file. This allows callers to force a
process restart when the underlying configuration or code changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Dawson Toth <dawson@harperdb.io>
Co-authored-by: Chris Barber <chris@cb1inc.com>
kriszyp and others added 21 commits May 5, 2026 05:54
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
The additionalAuditRefsNextEncoding addition was applied twice, leaving
a duplicate block of let declarations that caused a lint parse error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…heck

Symptom: harper PR #411 from a real org member was silently skipped —
`Claude PR Review` evaluated its job-level `if:` to false and ran zero
steps.

Cause: GitHub's webhook `author_association` is unreliable. It reports
`CONTRIBUTOR` (or `NONE`) for org members with private membership AND
for users whose repo access comes via team membership rather than
direct collaborator status. Real HarperFast team members fall into
both buckets. Forcing visibility changes is hostile UX, and the
collaborators API would admit a broader population (read-only
collaborators, default-org-permission users) than we want.

Fix: two-job pattern with team-membership check.

Each workflow has an `authorize` job that runs first, mints an
installation token from a HarperFast-org-owned GitHub App
(Members:Read scope), and checks team membership. The work/review
job has ONE `if: needs.authorize.outputs.authorized == 'true'`. No
step-level guards, no individual user list to maintain.

The App token lives in the authorize job ONLY — the work job uses
the default GITHUB_TOKEN, so the org-read capability never reaches
the agent step.

CODEOWNERS-driven trust set:
  - The auth check reads `.github/CODEOWNERS` via the default token
    and extracts every `@HarperFast/<team>` handle as the trust set.
  - Same set as people we trust to review code; alignment by
    construction. New owner team in CODEOWNERS automatically extends
    trigger trust. New consumer repo inherits its own CODEOWNERS.
  - External-org handles are deliberately ignored — only HarperFast
    teams.
  - If CODEOWNERS is missing, empty, or has no @HarperFast handles,
    falls back to @HarperFast/developers.

Per-workflow specifics:

- claude-review.yml: checks BOTH the PR author
  (`pull_request.user.login`) AND the event actor (`github.actor`).
  A non-trusted user pushing to a trusted user's PR branch changes
  the actor without changing the PR author; refusing those events
  closes that loophole. claude[bot] is admitted explicitly so
  AI-authored PRs from the issue-to-PR pipeline get reviewed
  (ADMIT_CLAUDE_BOT=true).
- claude-mention.yml: checks the commenter. claude[bot] not admitted
  here (only humans trigger mentions).
- claude-issue-to-pr.yml: checks the LABELER (github.actor), not the
  issue author. The labeler must already have at least triage
  permission; a maintainer labeling an external-author issue is a
  legitimate way to invoke the agent on community reports.

Per the post-#447 convention, the auth-check bash lives in
`.github/scripts/authorize-claude-workflow.sh` (shared across all
three workflows; parameterized by env vars). Workflows invoke via
`bash .github/scripts/...`.

Defense-in-depth lint:
- New `.github/workflows/auth-gate-invariants.yml` runs on any PR
  touching a `claude-*.yml` workflow file. Validates structurally
  via `bash .github/scripts/validate-auth-gate-invariants.sh`:
  * `authorize` job exists.
  * `authorize.outputs.authorized` wired to a step output.
  * `actions/create-github-app-token` present and pinned to a SHA.
  * `authorize.permissions` has no `write` scopes.
  * `HARPERFAST_AI_CLIENT_ID` and `HARPERFAST_AI_APP_PRIVATE_KEY`
    secrets referenced.
  * Every non-authorize job has `needs: authorize` and an exact
    `if: needs.authorize.outputs.authorized == 'true'` (no
    compound expressions, no tautologies).
  Make this a REQUIRED status check on `main` via branch
  protection. Subtle attacks on the bash logic are caught by
  CODEOWNERS review on `.github/`.

Required (organization-level) secrets — must be set on the
HarperFast org for any consumer repo to authorize a Claude run:
  - HARPERFAST_AI_CLIENT_ID       (the App's Client ID, like Iv23li…)
  - HARPERFAST_AI_APP_PRIVATE_KEY (.pem file contents)

Replaces #417's two earlier commits (which were on a stale base
that pre-dated #432, #437, #438, #439, #442, #444, #447).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r enforces

Address review on #417:

1. **Auth check fail-open (high).** `authorize-claude-workflow.sh`
   would fall through to `authorized=true` when `USERS_TO_CHECK`
   was empty or whitespace-only. The `read` loop's heredoc on `""`
   reads zero non-empty lines, every iteration is skipped by
   `[ -z "$user" ] && continue`, and the trailing
   `echo "authorized=true"` wins. A workflow that forgot to set
   USERS_TO_CHECK — or a malicious PR that removed it — would
   admit every event. Adds a guard right after the trust-set
   resolution: `${USERS_TO_CHECK//[[:space:]]/}` empty → `::error::`,
   `authorized=false`, exit 0.

2. **Validator gap (medium).** `validate-auth-gate-invariants.sh`
   structurally checked the authorize job, secrets, and
   `needs:`/`if:` shape, but never asserted that USERS_TO_CHECK was
   wired to a step in the authorize job. With (1) fixed, an
   omission fails closed at runtime — but the validator should
   still trip structurally so the omission is caught before merge.
   Adds new check #6: `yq` over `.jobs.authorize.steps[].env.USERS_TO_CHECK`,
   fails the workflow if no step sets it.

Verified locally:
- Empty / whitespace-only USERS_TO_CHECK → script denies.
- Non-empty USERS_TO_CHECK → loop runs as before.
- All three claude-*.yml workflows already set USERS_TO_CHECK on
  the right step (validator passes against current PR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diagnosis from harper PR #450 review failure (run 25234303343):
agent burned ~20 turns on `Bash(grep -rn …)` searches across the
whole repo before hitting `--max-turns 24`, posted no review
comment, and the log step skipped (no marker'd comment to log).
Diff was small (5 files, +83/−8) — exploration pattern, not diff
size, was the cost driver.

Two changes:

1. **--max-turns 24 → 48.** Cheap band-aid that ends the immediate
   dropouts on PRs with non-trivial cross-symbol exploration. Per
   the existing comment, this is the real cost ceiling — bumping
   doubles the worst-case budget and moves us out of the 'one
   exploratory grep too many = no review at all' regime.

2. **Tools section: explicit turn-budget hint + sharpened
   anti-pattern.** Names "repo-wide search" as THE biggest
   turn-burner, points at the `Grep` tool with `path` scoping, and
   cites the failure mode ("recent run that timed out before
   posting anything") so the agent has concrete signal that
   matches its observed behavior.

Out of scope: investigating whether `Bash(grep …)` is supposed to
be denied by the existing `--allowedTools` list (current entries
scope to `Bash(git diff:*)`, `Bash(git log:*)`, `Bash(git blame:*)`,
`Bash(git show:*)`, `Bash(gh ...:*)` — `Bash(grep …)` shouldn't
match any pattern but ran cleanly in #450's failed run, with
`is_error: false` on every grep tool call). Either claude-code-
action permits any `Bash(...)` once the tool is enabled, or there's
a matching gotcha. Tracked as a follow-up; this PR addresses the
immediate symptom.

oauth needs the same change — will mirror after this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: every PR was failing `Auth gate invariants / validate`
with `authorize job has no step setting USERS_TO_CHECK env var`,
even on workflows that clearly set it. Reported on PR #452 and
others.

Cause: the check `USERS_TO_CHECK presence` (added in #417 review-
fixup) was using a jq-flavored expression — `// empty` in
particular — but yq on ubuntu-latest is `mikefarah/yq` (Go), not
jq. yq's lexer rejects `empty`. The script had `2>/dev/null` on
the yq invocation, so the lexer error was eaten and the variable
came back as the empty string, tripping the existence check on
every workflow.

Fix: rewrite the expression in idiomatic yq:
- `.jobs.authorize.steps[].env.USERS_TO_CHECK | select(. != null)`
  — emits a stream of one value per step that sets the env var,
  skipping steps that don't.
- `head -1` collapses the stream to a single value (or empty)
  for the `[ -n ]` check.

Verified locally with mikefarah/yq v4.53.2 against all three
harper claude-*.yml workflows — all pass.

Out of scope but worth following up:
- The `2>/dev/null` swallowed a real yq error. Worth either
  removing the suppression or capturing stderr for diagnostics
  when the expression returns empty.
- oauth's mirror (#68) carries the same bug; a copy of this
  fix needs to land there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runner deprecation warning on the validate job:

> Node.js 20 actions are deprecated. The following actions are
> running on Node.js 20 and may not work as expected:
> actions/checkout@34e1148.
> Actions will be forced to run with Node.js 24 by default starting
> June 2nd, 2026. Node.js 20 will be removed from the runner on
> September 16th, 2026.

`actions/checkout` v5.0.0 was the Node 20 → Node 24 jump. v6.0.2
(2026-01-09) is the latest stable, ~4 months baked, and is already
in use elsewhere in this repo (`format-check.yml`,
`integration-tests.yml`, `unit-test.yml`) — same SHA, no new
trust surface.

11 instances bumped across:
- auth-gate-invariants.yml          (1)
- claude-issue-to-pr.yml            (3)
- claude-mention.yml                (3)
- claude-review.yml                 (4)

v6.0.0's notable change was "persist creds to a separate file"
(security improvement). v6.0.2 adds tag-handling fixes. We don't
configure custom credential persistence in our checkouts, so the
upgrade is transparent.

Other actions in claude-*.yml were NOT flagged by the runner —
`actions/setup-node@v6.2.0`, `actions/create-github-app-token@v3.1.1`,
`anthropics/claude-code-action@v1.0.99` are presumed Node-24-clean.
If/when their warnings appear, that's a separate sweep.

oauth needs the same change — will mirror after this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep of currently-pinned third-party action SHAs to latest stable
releases past the 3-day buffer (per pin-bump trust model — internal
repos can ship same-day, third-party waits for bake time).

Bumps applied:

- **actions/setup-node**: v6.2.0/v6.3.0 → v6.4.0 (2026-04-20, 14
  days old). Minor version, no breaking changes. Aligns
  claude-*.yml with format-check.yml / integration-tests.yml /
  unit-test.yml at a single version.
  - claude-issue-to-pr.yml, claude-mention.yml: v6.2.0 → v6.4.0
  - format-check.yml, integration-tests.yml, unit-test.yml: v6.3.0 → v6.4.0

- **actions/upload-artifact**: v7.0.0 → v7.0.1 (2026-04-10, 24
  days old). Patch version. integration-tests.yml only.

- **anthropics/claude-code-action**: v1.0.99 → v1.0.110 (2026-04-29,
  5 days old). Patch range, no breaking changes from v1
  migration. v1.0.111 (2026-05-01) is at the buffer edge so
  conservatively held back. claude-issue-to-pr.yml,
  claude-mention.yml, claude-review.yml.

Skipped (already current past buffer):
- actions/checkout v6.0.2 (2026-01-09)
- actions/create-github-app-token v3.1.1 (2026-04-11)
- actions/download-artifact v8.0.1 (2026-03-11)

Verified locally:
- All affected workflows parse as valid YAML.
- `validate-auth-gate-invariants.sh` passes against all three
  claude-*.yml workflows.

oauth needs a parallel sweep — will follow once this lands. oauth's
sweep also picks up:
- The actions/checkout v4.3.1 → v6.0.2 mirror (the change shipped
  in harper #461 hasn't been mirrored to oauth yet).
- pr-checks.yml's three tag-pinned refs (`actions/checkout@v4`,
  `actions/setup-node@v4`, `oven-sh/setup-bun@v2`) need SHA pins.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
script.sourceURL can be undefined in importModuleDynamically; fall back
to the enclosing CJS module's url so resolveModule always has a valid referrer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove fetch-depth: 0 from checkout; use targeted --depth=200 fetch
  in the script instead (needed for cherry-pick parent lookup and
  release branch subject-match check)
- Use Harperfast / noreply@harperdb.io as git author identity

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
existsSync follows symlinks and returns false for dangling symlinks,
so the else branch would call symlinkSync on an already-present
(but broken) symlink file, causing EEXIST. Switch to lstatSync which
detects the symlink file itself regardless of target validity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… PRs

GitHub fires 'issues' not 'pull_request' events when labels are added to
closed/merged PRs. Add issues:labeled trigger with merged_at check, and
fetch merge SHA via API when not available in event payload.
@kriszyp kriszyp requested a review from a team as a code owner May 5, 2026 11:54
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 5, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedtypescript-eslint@​8.59.0 ⏵ 8.58.01001007498100
Updated@​aws-sdk/​client-s3@​3.1037.0 ⏵ 3.1042.098 +110010098100

View full report

assert.equal(r.body[0].totalPrice, 119, r.text);
assert.equal(r.body[0].notIndexedTotalPrice, 119, r.text);
assert.equal(r.body[0].jsTotalPrice, 119, r.text);
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jsTotalPrice assertion was removed here (it was assert.equal(r.body[0].jsTotalPrice, 119, r.text)).

jsTotalPrice is registered via setComputedAttribute('jsTotalPrice', fn) in the component's resources.js. With the operations API now running on the main thread, component JS does not run there, so this attribute is unavailable to search_by_value queries. The @computed(from: "...") attributes (totalPrice, notIndexedTotalPrice) still work because the new createComputedFrom path in Table.ts reconstructs them from persisted metadata.

This is an intentional limitation of the design, but it is undocumented. Please either:

  • Add a comment here (and ideally in AGENTS.md or a CHANGELOG) stating that JS-only computed attributes (setComputedAttribute without a GraphQL @computed(from: "...")) are not available via the operations API when it runs on the main thread, or
  • Keep the assertion if the attribute is expected to be accessible (which would require a fix on the Table layer to also handle non-expression computed functions across threads).

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 5, 2026

1. JS-computed attributes silently absent from operations API responses (undocumented regression)

File: integrationTests/apiTests/tests/18_computedIndexedProperties.mjs:50

What: assert.equal(r.body[0].jsTotalPrice, 119) was removed from the search_by_value + get_attributes test. jsTotalPrice is registered via setComputedAttribute('jsTotalPrice', fn) from component JS — the new createComputedFrom path in Table.ts handles only @computed(from: "expression") GraphQL attributes and cannot reconstruct arbitrary JS functions. Callers that include jsTotalPrice in get_attributes now silently receive undefined.

Why it matters: Integrators using JS-function-registered computed attributes via the operations API will see silent behavioral breakage after upgrading — no warning, no error, no documentation of the limitation.

Suggested fix: Add a comment in the test explaining why the assertion was removed, and document the limitation in AGENTS.md or near setComputedAttribute: JS-function-registered computed attributes are not available via the operations API (main thread) because component code does not run there.

(Not addressed in the new push — commits 87be177f8 / c993d42fd addressed auth initialization on the main thread, not this regression.)

assert.equal(r.body[0].totalPrice, 119, r.text);
assert.equal(r.body[0].notIndexedTotalPrice, 119, r.text);
assert.equal(r.body[0].jsTotalPrice, 119, r.text);
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS-function-registered computed attributes (jsTotalPrice, set via setComputedAttribute(name, fn) from component code) are now silently absent when queried via the operations API — callers requesting that attribute in get_attributes will receive undefined instead of the computed value, with no error or warning.

The createComputedFrom fix in Table.ts reconstructs only @computed(from: "expression") GraphQL attributes; it cannot reconstruct arbitrary JS functions, so this gap is real and permanent under the current architecture.

This is a silent behavioral regression for any integrator who combines JS-computed attributes with the operations API. Minimum fix: add a comment here explaining why the assertion was removed, and note the limitation in AGENTS.md or a comment near setComputedAttribute — something like: "JS-function-registered computed attributes are not available via the operations API (main thread) because component code does not run there."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants