Skip to content

fix: write Harper config files atomically#493

Open
Devin-Holland wants to merge 2 commits intomainfrom
fix/clone-config-atomic-write
Open

fix: write Harper config files atomically#493
Devin-Holland wants to merge 2 commits intomainfrom
fix/clone-config-atomic-write

Conversation

@Devin-Holland
Copy link
Copy Markdown
Member

Summary

Replaces direct fs.writeFileSync in config/configUtils.js with a temp-file + fs.renameSync helper so concurrent readers never observe a truncated or empty file during a rewrite.

Why

fs.writeFileSync opens the target with O_TRUNC and then writes — between those two calls the file is zero-length. If a reader reads in that window it gets an empty buffer; YAML.parseDocument('') returns a document with no top-level keys, and downstream configValidator then throws rootPath config parameter is undefined and the worker calls process.exit(1).

This was hit in the field on a six-node cluster: one node's clone wedged with the worker logs:

[http/4] [warn]: Cannot apply runtime env config: rootPath not found in config
[http/4] [error]: Error initializing environment manager
[http/4] [error]: rootPath config parameter is undefined

The race window is opened by cloneNode/cloneNode.ts calling cloneConfig() (which rewrites the config file via createConfigFile) after main() has already started spawning HTTP workers — workers running env.initSync() can land their readFileSync exactly during the truncate-to-write interval. Since process.exit from a worker only kills that worker (not the parent), the clone process continued running but in a wedged state with no exit code, so the container never restarted.

What changed

  • New internal helper atomicWriteFile(filePath, content) in config/configUtils.js that writes to <filePath>.<pid>.<timestamp>.tmp and fs.renameSyncs over the target.
  • Replaced six writers: createConfigFile, checkForUpdatedConfig, updateConfigValue, applyRuntimeEnvVarConfig, addConfig, deleteConfigFromFile. The race exists for any concurrent reader, not only clone.
  • Test coverage: 4 new unit tests for the helper itself; updated the existing applyRuntimeEnvVarConfig write test to verify the temp+rename pair.

Areas to look at

  • Helper placement and signature. Single internal function in configUtils.js, not exported. If you'd prefer it lives in a shared utility instead, easy to move.
  • Temp file naming (<filePath>.<pid>.<Date.now()>.tmp). Same directory as target so the rename is atomic on the same filesystem. Pid+timestamp avoids collisions across processes/threads. Worth confirming this is acceptable on the deployment filesystems we care about.
  • The addConfig async → sync conversion. Line 966 was await fs.writeFile(...); now uses the sync helper. addConfig is in an async function but called rarely (config-add ops API), so blocking briefly is fine. Flag if you'd rather an async variant.
  • Out of scope, intentionally. Did NOT touch cloneNode.ts ordering or environmentManager.initSync's process.exit. The atomic write closes the actual race; reordering clone vs main() is a structural change with broader risk surface, and process.exit from a worker is a separate footgun worth its own PR.

Test plan

  • npm run test:unit:config passes locally (51 passing)
  • Manually verify a fresh clone on a multi-node cluster no longer wedges

Replaces direct fs.writeFileSync calls in configUtils with a temp-file +
rename helper so concurrent readers never observe a truncated or empty
file during a rewrite. fs.writeFileSync truncates before writing, which
let worker threads read a zero-byte config and exit during clone — one
node out of a six-node cluster failed with "rootPath config parameter
is undefined" and the clone never completed.

Applies to all six config writers (createConfigFile, checkForUpdatedConfig,
updateConfigValue, applyRuntimeEnvVarConfig, addConfig, deleteConfigFromFile)
since the same race exists for any concurrent reader.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Devin-Holland Devin-Holland requested a review from a team as a code owner May 7, 2026 19:06
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

Reviewed; no blockers found.

Adds coverage that chokidar (used by RootConfigWatcher) emits a change
event when the watched file is replaced via temp-file + rename rather
than direct truncate-write. Confirms the atomic-write change does not
silently break runtime config reload paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Devin-Holland Devin-Holland requested a review from kriszyp May 7, 2026 20:30
Comment thread config/configUtils.js
function atomicWriteFile(filePath, content) {
const tempPath = `${filePath}.${process.pid}.${Date.now()}.tmp`;
fs.writeFileSync(tempPath, content);
fs.renameSync(tempPath, filePath);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great trick. I used it like 12 years ago as it was the only way to atomically write a file. Nice.

Comment thread config/configUtils.js
// Write atomically via temp file + rename so readers don't observe a truncated/empty file
function atomicWriteFile(filePath, content) {
const tempPath = `${filePath}.${process.pid}.${Date.now()}.tmp`;
fs.writeFileSync(tempPath, content);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the encoding here, then you don't need the String() calls all over the place.

Suggested change
fs.writeFileSync(tempPath, content);
fs.writeFileSync(tempPath, content, 'utf-8');

Copy link
Copy Markdown
Member

@kriszyp kriszyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should protect against partial files, but if there is still a race condition of writing to a config file at the same time you are hoping workers read the write, that still... a race condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants