Skip to content

Replace regex dependency parser with gcov line-level coverage pruning#1273

Draft
sbryngelson wants to merge 3 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning
Draft

Replace regex dependency parser with gcov line-level coverage pruning#1273
sbryngelson wants to merge 3 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning

Conversation

@sbryngelson
Copy link
Member

Summary

  • Replace the manually-maintained test_mapping.json and regex-based dependency_parser.py with a fully automated gcov instrumentation approach for --only-changes test pruning
  • New toolchain/mfc/test/coverage.py builds a coverage cache by running all tests with a --gcov build, recording which .fpp line numbers each test executes, then filters tests by intersecting git diff -U0 line ranges against per-test coverage data
  • Add --build-coverage-cache flag to ./mfc.sh test and --line-marker-format=gfortran5 to Fypp invocation so gcov correctly maps coverage to .fpp (not .f90) line numbers
  • 43 unit tests covering all corner cases (exact line hits, no-overlap, large-module line granularity, GPU #ifdef blocks, stale cache, new files)

Why

The previous regex parser had critical gaps:

  • #include macro files (parallel_macros.fpp etc.) were invisible — changing them ran 0 tests
  • 28 source files had no entry in test_mapping.json
  • File-level granularity: changing 5 lines in a 5,257-line file triggered ~300 tests

The gcov approach is completely automated (no JSON to maintain), correct by construction (the compiler is the oracle), and line-granular.

Workflow

./mfc.sh build --gcov -j 8              # one-time: build with coverage
./mfc.sh test --build-coverage-cache    # one-time: populate cache (~2-3x time)
./mfc.sh test --only-changes -j 8       # fast: only tests covering changed lines

Test plan

  • Build with --gcov and run --build-coverage-cache on CI-like Linux (gfortran)
  • Verify --only-changes correctly prunes tests when only a few .fpp lines change
  • Verify infrastructure file changes (macros, cases.py) trigger full suite
  • Verify stale cache (after cases.py modification) falls back to full suite
  • Run unit tests: python -m pytest toolchain/mfc/test/test_coverage_unit.py

🤖 Generated with Claude Code

sbryngelson and others added 2 commits February 26, 2026 20:38
Replaces the manually-maintained test_mapping.json and regex-based
dependency_parser.py with a fully automated approach using gfortran gcov
instrumentation.

Key changes:
- Add toolchain/mfc/test/coverage.py: build coverage cache by running all
  tests with a --gcov build, recording which .fpp line numbers each test
  executes; filter tests by intersecting git diff -U0 line ranges against
  per-test coverage data
- Add --build-coverage-cache flag to ./mfc.sh test
- Replace --only-changes implementation in test.py with coverage-based filter
- Add --line-marker-format=gfortran5 to CMakeLists.txt Fypp invocation so
  gcov correctly maps coverage to .fpp line numbers (not .f90 line numbers)
- Delete dependency_parser.py (530 lines) and test_mapping.json (148 lines)
- Add 43 unit tests covering all corner cases: exact line hits, no-overlap,
  large-module line granularity, GPU #ifdef blocks, stale cache, new files

Workflow:
  ./mfc.sh build --gcov -j 8              # one-time: build with coverage
  ./mfc.sh test --build-coverage-cache    # one-time: populate cache (~2-3x time)
  ./mfc.sh test --only-changes -j 8       # fast: only tests covering changed lines

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Claude Code Review

Head SHA: 015ac92
Files changed: 7

File +/-
.gitignore +3
CMakeLists.txt +1
COVERAGE_PLAN.md +784
toolchain/mfc/cli/commands.py +25
toolchain/mfc/test/coverage.py +416
toolchain/mfc/test/test.py +52 / -1
toolchain/mfc/test/test_coverage_unit.py +570

Summary

  • Replaces regex-based dependency_parser.py and manual test_mapping.json with line-level gcov coverage for --only-changes test pruning — a sound design improvement
  • coverage.py is well-structured with clear function boundaries and good conservative defaults (new tests not in cache → always run)
  • 43 unit tests cover the core logic well (diff parsing, line-overlap detection, boundary conditions)
  • One committed file should be removed; one CMake change has broader blast radius than stated; two silent-failure paths need attention

Findings

[BLOCKER] COVERAGE_PLAN.md should not be committed

COVERAGE_PLAN.md is a 784-line LLM prompting handoff document (its own header says: "Start a new Claude Code session and say: 'Implement the plan in COVERAGE_PLAN.md'"). This is internal implementation scaffolding, not user-facing documentation. It belongs in a branch description or a local file, not the repo. Please remove it before merging.


[HIGH] --line-marker-format=gfortran5 is a global CMake change

CMakeLists.txt line 384 — this option is added unconditionally to Fypp for all builds and all compilers (nvfortran, Cray ftn, Intel ifx, gfortran). The gfortran5 format emits # N "file" instead of # line N "file". While valid C preprocessor syntax in principle, the PR test plan only covers gfortran on Linux. Given MFC's 4 CI-gated compilers, explicit confirmation is needed that nvfortran, Cray ftn, and ifx all accept this format before merging.


[HIGH] Silent failure in build_coverage_cache can produce an invisible bad cache

coverage.py lines ~254-258:

try:
    case.run([PRE_PROCESS, SIMULATION], gpus=set())
except Exception:
    # Continue even if the test fails — we only need .gcda data
    pass

If a bug causes many tests to fail, the cache will contain entries where every test covers 0 lines. A subsequent --only-changes run would then silently skip all tests — the worst possible failure mode for a CI safety mechanism. At minimum, failures should be counted and reported at the end:

Coverage cache built. 12/547 tests failed — their coverage data may be incomplete.

[HIGH] COVERAGE_CACHE_PATH is a module-level constant that ignores root_dir

coverage.py line ~43:

COVERAGE_CACHE_PATH = Path(common.MFC_ROOT_DIR) / "toolchain/mfc/test/test_coverage_cache.json"

Every function that uses the cache accepts a root_dir argument, but COVERAGE_CACHE_PATH ignores it and is evaluated once at import time. load_coverage_cache(root_dir) ignores its own argument for the cache path. Either make the path a function parameter derived from root_dir, or remove root_dir from the cache functions' signatures.


[MEDIUM] --only-changes silently skips all tests when only non-.fpp toolchain files change

test.py + coverage.py — if a developer changes toolchain/mfc/run/input.py (which generates the Fortran .inp namelist) or other toolchain files not in ALWAYS_RUN_ALL, --only-changes detects no .fpp changes and skips all tests with a green "No changes" message. This is a false negative. Consider expanding ALWAYS_RUN_ALL to cover toolchain/mfc/run/input.py or using a broader heuristic for toolchain Python changes.


[MEDIUM] set(range(start, end + 1)) in filter_tests_by_coverage is O(N) memory

coverage.py line ~365:

if covered_lines & set(range(start, end + 1)):

For a large diff hunk (e.g., entire 5000-line file replaced), this allocates a 5000-element Python set per test per changed range. Since covered_lines is already a sorted list, a cheaper check is:

if any(start <= line <= end for line in covered_lines):

Or use bisect for O(log N) lookup. Not a correctness issue, but worth fixing before this runs on 547 tests x many ranges.


[MINOR] Unit test import mechanism is fragile

test_coverage_unit.py lines ~70-130 — the fallback path does string substitution on the source of coverage.py and execs it. If import lines in coverage.py change, the fallback silently produces a broken module. The # noqa: S102 on exec may also conflict with security linting. Extracting the pure functions (_parse_diff_output, _parse_gcov_json_output, filter_tests_by_coverage, should_run_all_tests) into a standalone module without toolchain dependencies would make them trivially importable in tests.


Improvement Opportunities

  • find_gcov_binary and _parse_diff_output both have dummy _root_dir parameters with pylint: disable=unused-argument for interface uniformity. These could simply be removed.
  • The progress output in build_coverage_cache shows line counts but not which tests returned 0-coverage (which would indicate silently swallowed failures). Printing a warning for zero-coverage tests would make the cache build more observable.

@github-actions
Copy link

Claude Code Review

Head SHA: 8b88b8b
Files changed: 7 — .gitignore, CMakeLists.txt, COVERAGE_PLAN.md (new, 784 lines), toolchain/mfc/cli/commands.py, toolchain/mfc/test/coverage.py (new, 416 lines), toolchain/mfc/test/test.py, toolchain/mfc/test/test_coverage_unit.py (new, 570 lines)


Summary

  • Replaces regex-based dependency_parser.py + test_mapping.json with automated gcov line-level coverage for --only-changes test pruning
  • Adds --build-coverage-cache flag, coverage.py module, and 43 unit tests
  • Adds --line-marker-format=gfortran5 to Fypp invocation in CMakeLists.txt
  • Checks in COVERAGE_PLAN.md, a 784-line AI session handoff document

Findings

🔴 High — COVERAGE_PLAN.md should not be committed

COVERAGE_PLAN.md is explicitly described at line 33 as:

Handoff document. Start a new Claude Code session on a Linux CI-like machine (gfortran, not Apple Silicon) and say: "Implement the plan in COVERAGE_PLAN.md".

This is an AI session planning artifact, not project documentation. It duplicates the PR description and implementation code in a 784-line file that will persist in the repo history forever. Remove it before merging.


🔴 High — post_process changes silently skip all tests

build_coverage_cache (coverage.py:1103) only runs:

case.run([PRE_PROCESS, SIMULATION], gpus=set())

POST_PROCESS is never executed. As a result, no .fpp file under src/post_process/ accumulates any coverage data. When --only-changes is invoked on a PR that modifies post_process code, filter_tests_by_coverage finds zero covered lines for those files → every test is skipped with only a yellow warning printed (coverage.py:1284-1286). The warning does not prevent the skip. This is a correctness regression vs. the original file-level approach for post_process changes. Either add POST_PROCESS to the run list, or fall back to running the full suite when a changed file has no coverage in any test's cache.


🟡 Medium — --line-marker-format=gfortran5 is unconditional and untested on CI compilers

CMakeLists.txt:384 adds --line-marker-format=gfortran5 to the Fypp invocation for all builds, not just --gcov builds. Fypp's std format emits # line N "file"; gfortran5 emits # N "file". This changes preprocessed .f90 output for nvfortran, Cray ftn, and Intel ifx. While all four CI-gated compilers accept # N "file", this is a behavioral change to preprocessing that should be validated on CI before merging. If the sole purpose is enabling gcov mapping, consider gating it on MFC_GCov or making it a CMake option.


🟡 Medium — Performance: set(range(start, end + 1)) materializes large integer sets

coverage.py:1273:

if covered_lines & set(range(start, end + 1)):

For a new file added in a single hunk (e.g., @@ -0,0 +1,5000 @@), this creates set(range(1, 5001)) — 5000 integers — for every test's coverage check against that file. With 547 tests, this is 547 × 5000 = 2.7 M integer allocations per changed file. Prefer:

if any(start <= ln <= end for ln in covered_lines):

This is O(covered_lines) instead of O(range_size) and avoids the temporary set.


🟡 Medium — Potential duplicate argument registration for --only-changes and --changes-branch

The COVERAGE_PLAN.md states (lines 73–74):

--only-changes arg | toolchain/mfc/cli/commands.py | Keep this arg, replace its implementation
--changes-branch arg | toolchain/mfc/cli/commands.py | Keep this arg

If these arguments pre-existed on the test-pruning branch base, the diff at commands.py:835-845 re-adds them, which would cause duplicate argument registration errors at startup. Confirm these are genuinely new additions on this branch.


🟢 Low — datetime.utcnow() deprecated in Python 3.12+

coverage.py:1123:

"created": datetime.datetime.utcnow().isoformat(),

utcnow() is deprecated since Python 3.12. Use:

datetime.datetime.now(datetime.timezone.utc).isoformat()

🟢 Low — Test import machinery is fragile

test_coverage_unit.py:1488-1494 falls back to loading coverage.py via exec() with string-replace patching of import lines:

_src = (
    _src
    .replace("from ..printer import cons", "cons = _globals['cons']")
    ...
)
exec(compile(_src, _COVERAGE_PATH, "exec"), _globals)

This breaks silently if the import lines in coverage.py change whitespace, gain as aliases, or are reordered. Also, sys.modules["toolchain.mfc.test.coverage"] = None (line 1445) poisons the module registry — any subsequent import toolchain.mfc.test.coverage elsewhere in the test process will get ImportError. Consider using unittest.mock.patch at the module boundary instead, or restructuring coverage.py to accept injected dependencies.

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.04%. Comparing base (1412eb2) to head (8b88b8b).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1273      +/-   ##
==========================================
- Coverage   44.05%   44.04%   -0.02%     
==========================================
  Files          70       70              
  Lines       20496    20492       -4     
  Branches     1991     1992       +1     
==========================================
- Hits         9029     9025       -4     
+ Misses      10328    10327       -1     
- Partials     1139     1140       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Three bugs found during integration testing on Phoenix:
- gcov 12 outputs raw JSON, not gzip; added fallback parsing
- Symlink paths (/storage/scratch1 vs /storage/home) caused relpath
  mismatch; use os.path.realpath() before computing relative paths
- Missing case.delete_output()/create_directory() before case.run()
  caused silent test failures during cache build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Claude Code Review

Head SHA: 6e7f5be
Files changed: 7

  • .gitignore
  • CMakeLists.txt
  • COVERAGE_PLAN.md (new, 784 lines)
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py (new, 425 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_unit.py (new, 588 lines)

Summary

  • Replaces the manually-maintained test_mapping.json + regex dependency parser with gcov-based line-level coverage pruning for --only-changes
  • New coverage.py builds a per-test coverage cache by running all tests individually with gfortran --coverage, then maps git diff -U0 line ranges to covered .fpp lines
  • 43 unit tests covering diff parsing, always-run logic, line-intersection, and gcov JSON parsing
  • Conservative design: unknown tests are included, stale cache falls back to full suite, GPU macro files always trigger full suite
  • The approach is architecturally sound, but several issues need attention before merge

Findings

❌ Critical — COVERAGE_PLAN.md must not be committed

File: COVERAGE_PLAN.md (entire file, 784 lines)

The document literally opens with:

"Handoff document. Start a new Claude Code session on a Linux CI-like machine and say: 'Implement the plan in COVERAGE_PLAN.md'."

This is internal AI scaffolding, not project documentation. It details implementation steps, design rationale, and integration verification steps that are only relevant during development. Committing it:

  • Exposes the AI-assisted development workflow in a way that may not be appropriate for the repo
  • Adds ~800 lines of noise to git log and git blame forever
  • Duplicates information already present in the PR description and code comments

Action: git rm COVERAGE_PLAN.md before merge.


❌ Critical — --line-marker-format=gfortran5 applied unconditionally to ALL compiler builds

File: CMakeLists.txt, the Fypp invocation (around line 384)

# Added unconditionally:
--line-marker-format=gfortran5

This flag controls how Fypp emits # N "file" line directives in generated .f90 files. The gfortran5 format is specifically tuned for gfortran's debugger/coverage integration. However, this change applies to all four CI-gated compilers (nvfortran, Cray ftn, Intel ifx, gfortran) and all build configurations including production GPU builds.

Risks:

  • nvfortran and Cray ftn may interpret line markers differently, affecting error location reporting and potentially PGI/NVIDIA's source-level debugger
  • This format should only be needed for gcov cache building (gfortran + --gcov), not for production nvfortran GPU builds

Recommended fix: Gate this flag behind the MFC_GCov CMake option:

if(MFC_GCov)
    --line-marker-format=gfortran5
endif()

⚠️ Important — post_process is excluded from coverage cache

File: toolchain/mfc/test/coverage.py, line 1112

case.run([PRE_PROCESS, SIMULATION], gpus=set())

Coverage is only collected for pre_process and simulation executables. Changes to src/post_process/ files will not appear in any test's coverage data. When a developer changes only post_process code and runs --only-changes, zero tests will match and the system will silently skip everything — producing false confidence.

post_process is not excluded from ALWAYS_RUN_ALL, so there's no safety net for this case. Either:

  1. Add POST_PROCESS to the case.run(...) call during cache building, or
  2. Add src/post_process/ to ALWAYS_RUN_ALL so changes there trigger the full suite

⚠️ Important — filter_tests_by_coverage performance: set(range(start, end + 1))

File: toolchain/mfc/test/coverage.py, line 1282

if covered_lines & set(range(start, end + 1)):

For large hunks (e.g., a 2000-line refactor), this constructs a set of 2000 integers per test per changed range — with 547 tests this is potentially millions of set allocations. A more efficient approach since covered_lines is already sorted:

if any(start <= ln <= end for ln in covered_lines):

or using bisect on the sorted list. The current approach is O(range_size) per check; the linear scan is O(min(covered_lines, range_size)) per check.


ℹ️ Minor — COVERAGE_CACHE_PATH is module-level, ignores root_dir in load_coverage_cache

File: toolchain/mfc/test/coverage.py, lines 907 and 1153

# Line 907 — evaluated at import time:
COVERAGE_CACHE_PATH = Path(common.MFC_ROOT_DIR) / "toolchain/mfc/test/test_coverage_cache.json"

# Line 1153 — root_dir argument is ignored for the cache path:
def load_coverage_cache(root_dir: str) -> Optional[dict]:
    if not COVERAGE_CACHE_PATH.exists():  # uses module constant, not root_dir

build_coverage_cache uses COVERAGE_CACHE_PATH directly (consistent), but load_coverage_cache and get_changed_line_ranges take root_dir parameters that differ in usage. The inconsistency is a minor maintenance hazard. Not a bug in normal usage since MFC_ROOT_DIR is set at toolchain startup.


:information_source\ Minor — gzip.BadGzipFile requires Python 3.8+

File: toolchain/mfc/test/coverage.py, line 1034

except (gzip.BadGzipFile, OSError):

gzip.BadGzipFile was introduced in Python 3.8. If the toolchain needs to support 3.7, this will fail at import time on Python 3.7. OSError (the fallback in except) would catch it anyway, but the reference to gzip.BadGzipFile itself will raise AttributeError during the except clause construction. Wrapping as except (getattr(gzip, 'BadGzipFile', OSError), OSError) or just except Exception would be safer.


What's done well

  • The ALWAYS_RUN_ALL frozenset correctly identifies GPU macro files and toolchain files that CPU coverage cannot reliably track
  • Using git merge-base instead of master tip is the right approach for PRs
  • Conservative behavior (unknown UUID → include, stale cache → full suite) is correct
  • Unit tests are comprehensive and cover the key corner cases (line boundaries, large modules, GPU-only #ifdef blocks)
  • The fallback chain (missing cache → warning → full suite) is user-friendly
  • Sequential cache building with .gcda file isolation is correct for per-test coverage attribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant