When running cargo test --workspace locally on uutils/coreutils, I observed that the set of failing tests changes between runs. This suggests that some tests are flaky or environment-dependent.
Environment
- OS: Ubuntu 22.04
- Rust version: rustc 1.93.0
- uutils/coreutils commit: 6a942ba
Observed failures
Different runs produce different failures, for example:
Run 1:
- test_df::test_df_arguments_override_themselves
- test_df::test_df_compatible_sync
- test_df::test_df_conflicts_overriding
- test_df::test_df_masked_proc_fallback
Run 2:
- test_df::test_df_masked_proc_fallback
- test_tail::test_follow_truncate_fast
- test_touch::test_touch_changes_time_of_file_in_stdout
Run 3:
- test_df::test_df_masked_proc_fallback
Run 4:
- test_df::test_df_masked_proc_fallback
- test_touch::test_touch_changes_time_of_file_in_stdout
Run 5:
- test_df::test_df_masked_proc_fallback
- test_tr::test_truncate_applies_before_complement_with_class
Expected behavior
Tests should be deterministic and either consistently pass or consistently fail.
Actual behavior
The set of failing tests varies between runs, even without code changes. This makes it difficult to validate local changes before submitting a PR.
Question
If I submit a PR and the CI/CD pipeline fails due to these flaky tests, what is the recommended way to proceed?
Should contributors re-run CI, mark these tests as known flaky, or simply note that failures are unrelated to the PR changes?
When running
cargo test --workspacelocally on uutils/coreutils, I observed that the set of failing tests changes between runs. This suggests that some tests are flaky or environment-dependent.Environment
Observed failures
Different runs produce different failures, for example:
Run 1:
Run 2:
Run 3:
Run 4:
Run 5:
Expected behavior
Tests should be deterministic and either consistently pass or consistently fail.
Actual behavior
The set of failing tests varies between runs, even without code changes. This makes it difficult to validate local changes before submitting a PR.
Question
If I submit a PR and the CI/CD pipeline fails due to these flaky tests, what is the recommended way to proceed?
Should contributors re-run CI, mark these tests as known flaky, or simply note that failures are unrelated to the PR changes?