Skip to content

Store requests can partially succeed locally when Fabric insert fails #5

@urcades

Description

@urcades

Summary

POST /mindchunks/create can return 500 Internal Server Error to clients when the upstream Fabric /insert call fails, even though the mindchunk has already been written to the local SQLite database.

This creates an ambiguous partial-failure mode:

  • clients see the store attempt as failed
  • the server may already have persisted the row locally
  • client retries can create duplicates or leave local-only records that never made it into Fabric

Evidence

Client-side failure example:

Store attempt failed again with server error requestId=050f25c1-1950-4a95-89bf-ecdf2416a0f6.

Fly logs repeatedly show the same server-side failure path:

Error: Error: Failed to request fabric: Internal Server Error
    at fabricRequest (/app/dist/fabric/index.js:27:15)
    at async sendMindchunkToFabric (/app/dist/fabric/index.js:33:22)
    at async Object.<anonymous> (/app/dist/routes/mindchunks/create/create.handler.js:16:9)

Relevant code path:

  • src/routes/mindchunks/create/create.handler.ts writes the mindchunk locally via createMindchunk(...)
  • the handler then awaits sendMindchunkToFabric(...)
  • src/fabric/index.ts throws when Fabric returns non-2xx
  • src/server/create-server.ts converts that into a 500 with requestId

Current Behavior

  1. The request hits POST /mindchunks/create.
  2. createMindchunk(...) inserts into SQLite and returns the new ID.
  3. sendMindchunkToFabric(...) calls Fabric /insert.
  4. If Fabric returns 500, the app throws.
  5. The global error handler returns 500 { error: "Internal Server Error", requestId }.

Expected Behavior

The API should avoid ambiguous partial failure.

Possible acceptable behaviors:

  • make the local DB write and Fabric insert effectively atomic from the client's point of view, or
  • return success once local persistence succeeds and enqueue/retry Fabric indexing asynchronously, or
  • explicitly surface a distinct "persisted locally but indexing failed" status so clients know whether retrying is safe

Impact

  • clients cannot tell whether a failed store actually persisted data locally
  • retries may create duplicate local rows
  • some mindchunks may exist locally but never be searchable if they were not inserted into Fabric
  • request IDs point to real server errors, but the operational symptom is an upstream Fabric outage rather than a Fly machine failure

Suggested Follow-ups

  • Decide on the desired consistency model for create/index operations
  • Add logging around the created local mindchunk.id when Fabric insert fails
  • Consider idempotency keys or de-duplication on create
  • Consider background retry / dead-letter handling for failed Fabric inserts
  • Add a regression test covering: local insert succeeds, Fabric insert fails, client retries

Notes

[PU01] client problem: invalid authority entries seen in Fly logs appear to be Fly proxy noise and do not appear to come from this application code path. They seem unrelated to the store failure above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions