feat(cpp-phase2): PR-04 resolution-quality fixes (glibc attrs, transitive includes)#682
Conversation
Tree-sitter's C grammar can't parse the trailing GCC attribute macro
chain on glibc declarations:
extern size_t strlen (const char *__s)
__THROW __attribute_pure__ __nonnull ((1));
The parser collapses the entire surrounding declaration into an ERROR
node, the walker skips ERROR nodes by design, and the symbol vanishes
from the manifest. Effect on Linux: ~50% of <string.h> and parts of
<stdio.h> were missing — strlen, strcmp, strncmp, strcasecmp, memcmp,
memmem, snprintf, vsnprintf and friends.
clike_preprocess.go strips known no-op glibc attribute macros to
whitespace before tree-sitter parses. Two passes:
- Bare-token list (__THROW, __THROWNL, __attribute_pure__, __wur,
__returns_nonnull, …) sourced from /usr/include/x86_64-linux-gnu/
sys/cdefs.h. Word-boundary regex so we don't match inside identifiers.
- Function-like list (__attribute__, __nonnull, __attr_access,
__attr_dealloc, __REDIRECT, …) with balanced-paren consumption so
the trailing argument list is fully eaten.
Length-preserving (replace with spaces, leave newlines intact) so
tree-sitter's reported line numbers stay accurate. Idempotent. Doesn't
strip __inline / __restrict / __flexarr — those are real keywords or
have semantic effect on parsing.
Wired into both extractCHeader and extractCppHeader since libstdc++
borrows the same decorations.
Validation against /usr/include on Ubuntu 24.04 (libstdc++-13):
- string.h: 38 → 48 functions
- stdio.h: recovered scanf/sprintf/snprintf/vsnprintf
- Total Linux C: 8467 → 9205 functions (+738)
Smoke fixture (testdata/c/stdlib): 5/7 → 7/7 resolved.
Redis full tree: 60.6% → 63.1% (+2.5 pp).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The PR-01 generator emits some namespaced free-function entries (notably std::move, std::forward, std::swap as overlay-only entries) into the manifest's `functions` map rather than `free_functions`. The PR-02 resolver only consulted `free_functions`, so those symbols silently failed to resolve at the call site. GetFreeFunction now consults both maps. The lookup order is free_functions first (the canonical map), then functions (the fallback). This is a strict superset of the previous behaviour — any entry that resolved before still resolves at the same priority — so it's safe to ship without coordinating a generator-side cleanup. Future work: tighten the generator emitter so namespaced symbols always land in `free_functions`. Tracked outside this stack. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Real-world C++ code routinely relies on transitive includes — a file calls std::move without `#include <utility>` because some other header it includes pulls utility in. PR-02's resolver walked only the caller file's direct system includes, so namespaced std::* calls failed to resolve in the majority of files that should have benefited. lookupCppStdlibFreeFunction now runs in two stages: 1. Direct system includes (the existing fast path). 2. Manifest-wide scan when the direct walk doesn't yield a hit. Bounded to qualified names (containing "::") so unqualified project symbols can never accidentally bind to a stdlib entry. First hit wins; stdlib FQNs are unique across headers so order doesn't affect correctness. To support the fallback, both CStdlibLoader and CppStdlibLoader gain a ListHeaders() method returning every manifest header name. Implemented on the file://+HTTP loaders by reading manifest.Headers; test fakes implement it by walking their fixture map. Fakes in the cmd package reuse the same pattern. Validation against proxygen full tree: - 12.0% → 17.1% resolved (+5.1 pp) - std::move drops out of the top-20 unresolved (was #1 with 1603 occurrences, all resolved now) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## shiva/cpp-phase2-pr03-remote #682 +/- ##
================================================================
+ Coverage 85.67% 85.68% +0.01%
================================================================
Files 202 203 +1
Lines 29141 29209 +68
================================================================
+ Hits 24966 25029 +63
- Misses 3203 3207 +4
- Partials 972 973 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Direct tests for the new ListHeaders() method on both loaders: - canonical case after LoadManifest returns the manifest's header set - before LoadManifest returns nil (matches HeaderCount semantics) - mutating the returned slice doesn't bleed into a subsequent call Pushes both methods from 0% to 100% coverage; satisfies codecov/patch threshold on PR #682. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Pure-stdlib gap analysis (third-party / macros excluded)Cross-referencing each unresolved call's target name against the actual symbols in our linux/c and linux/cpp manifests reveals two distinct unresolved buckets — both fully fixable on the stdlib side, neither addressed by this PR but both worth queueing as follow-ups. C — transitive-include fallback gap3,107 redis unresolved calls have targets that already exist in our linux/c manifest. Top:
Root cause: redis files Fix: mechanical port of this PR's C++ transitive fallback ( Estimated uplift: redis 63.1% → ~67.5% (+4.4 pp). Similar shape on any C codebase that uses an internal "common.h" pattern, which is essentially every non-trivial C project. C++ — receiver-type inference gap~3,621 proxygen unresolved calls have targets that exist as STL class methods. Top:
Root cause: these are unqualified
Fix: larger work — extending the type engine's propagation rules. Estimated uplift: proxygen 17.1% → ~30% (+13 pp). Out of scope (third-party / macros)
Recommended sequence
Both target the stdlib-resolved-rate metric specifically, not the noisier "% of all calls" overall number. |



Summary
PR-04 of the C/C++ Phase 2 stdlib stack — three resolution-quality fixes uncovered during local validation against redis and proxygen. Stacked on PR-03; should merge after.
What's in here
Three independent fixes, each in its own commit:
1. Recover glibc-decorated declarations (
fix(generator))Tree-sitter's C grammar can't parse:
It collapses the whole declaration into an
ERRORnode and the symbol vanishes. Effect:strlen,strcmp,strncmp,strcasecmp,memcmp,memmem,snprintf,vsnprintfand dozens of similar glibc functions never made it into the manifest.tools/internal/clikeextract/clike_preprocess.gostrips the macros to whitespace before tree-sitter parses (length-preserving, line-number-accurate). Macro list sourced from<sys/cdefs.h>. Wired into both C and C++ extractors.2. Resolver tolerant of generator's map placement (
fix(registry))std::moveis emitted into manifest'sfunctionsmap but PR-02'sGetFreeFunctiononly consultedfree_functions. Lookup now falls through to both maps. Strict superset of previous behaviour.3. Transitive-include fallback (
feat(builder))Real-world C++ relies heavily on transitive includes. A file that calls
std::movewithout directly#include <utility>(utility comes through<vector>etc.) used to fail resolution. The resolver now walks every manifest header as a fallback, bounded to qualified names so unqualified project symbols can't accidentally bind. Required addingListHeaders()to theCStdlibLoader/CppStdlibLoaderinterfaces.Validation results
Resolution-report on real codebases, regenerated registries, file:// loader.
testdata/c/stdlib)Top-1 unresolved in proxygen (
std::move, 1603 occurrences) is gone — allstd::movecalls now resolve via the transitive fallback. Total Linux C functions extracted: 8,467 → 9,205 (+738).The proxygen number is still well below the 98% spec target, but the remaining unresolveds are largely non-stdlib: gtest / glog / folly logging macros (
VLOG,EXPECT_EQ,CHECK), folly types (folly::makeUnexpected,hasError), and unqualified class-method calls that need receiver-type inference (separate concern, future PR).Verification
gradle buildGo— cleango test ./...— all packages passgolangci-lint run ./...— 0 issuesOut of scope (future PRs)
vec.size(),str.empty())free_functionsTest plan
🤖 Generated with Claude Code