Observation
While running the full file_watchdog test set sequentially via make test-unit name=TestLogFileWatchdog on macOS, TestLogFileWatchdog_ReopensOnDelete occasionally fails with:
file_watchdog_test.go:67: expected reopen Warn in slogger output; got ""
The test passes 5/5 in isolation (name=TestLogFileWatchdog_ReopensOnDelete count1=true).
Hypothesis
macOS kqueue (the fsnotify backend on Darwin) appears to have resource-pressure or event-coalescing edge cases when multiple watchers are created and torn down in rapid succession across consecutive tests in the same package. The 500ms polling fallback in internal/slogging/file_watchdog.go catches most missed events but the test's waitForFile deadline of 2s combined with the 200ms post-delete sleep may occasionally miss the reopen Warn before the test reads the buffer.
Suggested fix
Increase the post-delete wait, or switch the test to poll for the Warn message rather than sleep-then-check. Alternatively, refactor the watchdog tests to share a single watcher across cases (or use t.TempDir more aggressively to ensure each test gets a fresh kqueue fd in a fresh directory).
Reproduction
make test-unit name=TestLogFileWatchdog count1=true
# Repeat ~10 times; failure rate ~10%.
Origin
Discovered during implementation of #372. Production code is unaffected — only the test reliability is at issue.
Observation
While running the full file_watchdog test set sequentially via
make test-unit name=TestLogFileWatchdogon macOS,TestLogFileWatchdog_ReopensOnDeleteoccasionally fails with:The test passes 5/5 in isolation (
name=TestLogFileWatchdog_ReopensOnDelete count1=true).Hypothesis
macOS kqueue (the fsnotify backend on Darwin) appears to have resource-pressure or event-coalescing edge cases when multiple watchers are created and torn down in rapid succession across consecutive tests in the same package. The 500ms polling fallback in internal/slogging/file_watchdog.go catches most missed events but the test's
waitForFiledeadline of 2s combined with the 200ms post-delete sleep may occasionally miss the reopen Warn before the test reads the buffer.Suggested fix
Increase the post-delete wait, or switch the test to poll for the Warn message rather than sleep-then-check. Alternatively, refactor the watchdog tests to share a single watcher across cases (or use
t.TempDirmore aggressively to ensure each test gets a fresh kqueue fd in a fresh directory).Reproduction
make test-unit name=TestLogFileWatchdog count1=true # Repeat ~10 times; failure rate ~10%.Origin
Discovered during implementation of #372. Production code is unaffected — only the test reliability is at issue.