fix: write Harper config files atomically#493
Open
Devin-Holland wants to merge 2 commits intomainfrom
Open
Conversation
Replaces direct fs.writeFileSync calls in configUtils with a temp-file + rename helper so concurrent readers never observe a truncated or empty file during a rewrite. fs.writeFileSync truncates before writing, which let worker threads read a zero-byte config and exit during clone — one node out of a six-node cluster failed with "rootPath config parameter is undefined" and the clone never completed. Applies to all six config writers (createConfigFile, checkForUpdatedConfig, updateConfigValue, applyRuntimeEnvVarConfig, addConfig, deleteConfigFromFile) since the same race exists for any concurrent reader. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Reviewed; no blockers found. |
Adds coverage that chokidar (used by RootConfigWatcher) emits a change event when the watched file is replaced via temp-file + rename rather than direct truncate-write. Confirms the atomic-write change does not silently break runtime config reload paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
heskew
approved these changes
May 7, 2026
DavidCockerill
approved these changes
May 7, 2026
cb1kenobi
approved these changes
May 7, 2026
| function atomicWriteFile(filePath, content) { | ||
| const tempPath = `${filePath}.${process.pid}.${Date.now()}.tmp`; | ||
| fs.writeFileSync(tempPath, content); | ||
| fs.renameSync(tempPath, filePath); |
Member
There was a problem hiding this comment.
This is a great trick. I used it like 12 years ago as it was the only way to atomically write a file. Nice.
| // Write atomically via temp file + rename so readers don't observe a truncated/empty file | ||
| function atomicWriteFile(filePath, content) { | ||
| const tempPath = `${filePath}.${process.pid}.${Date.now()}.tmp`; | ||
| fs.writeFileSync(tempPath, content); |
Member
There was a problem hiding this comment.
If you put the encoding here, then you don't need the String() calls all over the place.
Suggested change
| fs.writeFileSync(tempPath, content); | |
| fs.writeFileSync(tempPath, content, 'utf-8'); |
kriszyp
approved these changes
May 7, 2026
Member
kriszyp
left a comment
There was a problem hiding this comment.
This should protect against partial files, but if there is still a race condition of writing to a config file at the same time you are hoping workers read the write, that still... a race condition.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces direct
fs.writeFileSyncinconfig/configUtils.jswith a temp-file +fs.renameSynchelper so concurrent readers never observe a truncated or empty file during a rewrite.Why
fs.writeFileSyncopens the target withO_TRUNCand then writes — between those two calls the file is zero-length. If a reader reads in that window it gets an empty buffer;YAML.parseDocument('')returns a document with no top-level keys, and downstreamconfigValidatorthen throwsrootPath config parameter is undefinedand the worker callsprocess.exit(1).This was hit in the field on a six-node cluster: one node's clone wedged with the worker logs:
The race window is opened by
cloneNode/cloneNode.tscallingcloneConfig()(which rewrites the config file viacreateConfigFile) aftermain()has already started spawning HTTP workers — workers runningenv.initSync()can land theirreadFileSyncexactly during the truncate-to-write interval. Sinceprocess.exitfrom a worker only kills that worker (not the parent), the clone process continued running but in a wedged state with no exit code, so the container never restarted.What changed
atomicWriteFile(filePath, content)inconfig/configUtils.jsthat writes to<filePath>.<pid>.<timestamp>.tmpandfs.renameSyncs over the target.createConfigFile,checkForUpdatedConfig,updateConfigValue,applyRuntimeEnvVarConfig,addConfig,deleteConfigFromFile. The race exists for any concurrent reader, not only clone.applyRuntimeEnvVarConfigwrite test to verify the temp+rename pair.Areas to look at
configUtils.js, not exported. If you'd prefer it lives in a shared utility instead, easy to move.<filePath>.<pid>.<Date.now()>.tmp). Same directory as target so the rename is atomic on the same filesystem. Pid+timestamp avoids collisions across processes/threads. Worth confirming this is acceptable on the deployment filesystems we care about.addConfigasync → sync conversion. Line 966 wasawait fs.writeFile(...); now uses the sync helper.addConfigis in an async function but called rarely (config-add ops API), so blocking briefly is fine. Flag if you'd rather an async variant.cloneNode.tsordering orenvironmentManager.initSync'sprocess.exit. The atomic write closes the actual race; reordering clone vsmain()is a structural change with broader risk surface, andprocess.exitfrom a worker is a separate footgun worth its own PR.Test plan
npm run test:unit:configpasses locally (51 passing)