feat(graph): C/C++ file detection and tree-sitter parsing foundation#668
Merged
shivasurya merged 1 commit intomainfrom May 3, 2026
Merged
feat(graph): C/C++ file detection and tree-sitter parsing foundation#668shivasurya merged 1 commit intomainfrom
shivasurya merged 1 commit intomainfrom
Conversation
SafeDep Report SummaryNo dependency changes detected. Nothing to scan. This report is generated by SafeDep Github App |
|
|
Code Pathfinder Security ScanNo security issues detected.
Powered by Code Pathfinder |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #668 +/- ##
=======================================
Coverage 85.06% 85.07%
=======================================
Files 172 173 +1
Lines 25027 25070 +43
=======================================
+ Hits 21290 21329 +39
- Misses 2941 2944 +3
- Partials 796 797 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9690595 to
caf0fec
Compare
Route .c/.cpp/.cc/.cxx/.h/.hpp/.hh/.hxx files to the C and C++ tree-sitter grammars and exclude common C/C++ build artifact directories (build/, cmake-build-*, third_party/, external/, obj/, bin/, dist/, .cache/) during file discovery. Header files share the .h extension between C and C++. A best-effort heuristic scans the first 100 lines for C++-only indicators (class, namespace, template, access specifiers, ::) and the result is cached in a sync.Map populated once per file in the initialize.go worker before AST traversal, so the per-AST-node IsCSourceFile / IsCppSourceFile lookups stay zero-I/O. The C/C++ detection helpers live in a new graph/clike/ subpackage that sits alongside the existing graph/golang, graph/python, graph/java, and graph/docker packages. graph/clike will accumulate the shared C/C++ extraction primitives in subsequent PRs (parser, type strings, parameter extraction); this PR introduces detection only. This is parsing-pipeline groundwork only — no AST → Node dispatch yet, so existing Java/Python/Go behavior is unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
caf0fec to
0f553a6
Compare
10 tasks
Owner
Author
This was referenced May 3, 2026
Owner
Author
Merge activity
|
shivasurya
added a commit
that referenced
this pull request
May 3, 2026
## Summary Stacked on **#668** (C/C++ file detection foundation). Adds the cross-cutting primitives that the C parser (PR-03) and C++ parser (PR-04) will share. Centralising the extraction logic here prevents two parallel implementations from drifting apart, and matches the convention used by `graph/golang/`, `graph/python/`, `graph/java/`, and `graph/docker/`. ## Files ``` sast-engine/graph/clike/ ├── doc.go ← package documentation ├── detection.go ← (from PR-01) language detection ├── detection_test.go ← (from PR-01) ├── declarations.go ← FunctionInfo, FieldInfo, extractors ├── declarations_test.go ├── types.go ← ExtractTypeString ├── types_test.go ├── helpers.go ← params, calls, keyword maps ├── helpers_test.go └── testhelpers_test.go ← shared parseC/parseCpp/findNode test utilities ``` ## Helpers - **`ExtractFunctionInfo`** → name, return type, params, `IsDeclaration` flag (forward decl vs definition) - **`ExtractStructFields`** → name+type pairs for structs/classes - **`ExtractTypeString`** → canonical type string with qualifiers, pointer/reference suffixes, templates, qualified names. Examples: `char*`, `const std::string&`, `unsigned long long`, `std::vector<int>`, `int**` - **`ExtractParameters`** → parallel `(names, types)` slices, variadics rendered as `("...", "...")` - **`ExtractCallInfo`** → classifies calls as free / method-dot (`obj.foo()`) / method-arrow (`ptr->foo()`) / qualified (`std::move(x)`); captures receiver and args - **`IsCKeyword` / `IsCppKeyword`** → C89..C23 + C++ additions; used by PR-05's statement extraction to filter reserved words ## Why an `innerDeclarator` helper Tree-sitter's C grammar exposes `pointer_declarator → declarator (field)`, but the C++ `reference_declarator` node only has anonymous named children for `&` and the inner identifier. `innerDeclarator` tries the field-named child first and falls back to scanning named children, so the same declarator walker handles `int**`, `char*`, and `const std::string&` without per-grammar branching. ## Test plan - [x] `go build ./...` — clean - [x] `go vet ./...` — clean - [x] `golangci-lint run ./graph/...` — 0 issues - [x] `go test ./graph/... -count=1` — all pass (no regressions) - [x] `TestExtractTypeString` — 11 cases (primitives, pointers, refs, qualifiers, templates, qualified names) + nil guard - [x] `TestExtractFunctionInfo` — C/C++ definitions, void/typed/pointer returns, variadics, namespaced methods + nil guard - [x] `TestExtractStructFields` — populated and empty structs + nil guard - [x] `TestExtractParameters` — typed, variadic, unnamed, void, C++ const-ref + nil guard - [x] `TestExtractCallInfo` — free / dot / arrow / qualified shapes + nil and wrong-node guards - [x] `TestIsCKeyword` / `TestIsCppKeyword` — C89..C23 keywords, C++-only additions, non-keywords, and the C/C++ exclusivity boundary
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
Foundation for C/C++ language support — file discovery and tree-sitter
grammar selection only. No AST → Node dispatch yet, so Java/Python/Go
behavior is unchanged.
.c,.cpp,.cc,.cxx,.h,.hpp,.hh,.hxxfiles tothe C and C++ tree-sitter grammars
getFiles():build/,cmake-build-*,third_party/,external/,obj/,bin/,dist/,.cache/.hheaders as C vs C++ via a best-effort heuristic(first ~100 lines scanned for
class,namespace,template<,access specifiers,
::)sync.Mappopulated once per file inthe
initialize.goworker before AST traversal — keeps the per-nodeIsCSourceFile/IsCppSourceFilelookups zero-I/OPackage layout
C/C++ helpers live in a new
graph/clike/subpackage that sits alongsidethe existing language siblings:
Subsequent PRs will add
graph/clike/parser.go, type-string extraction,and parameter extraction in the same package, mirroring the layout of
graph/golang/.Test plan
go build ./...— cleango vet ./...— cleangolangci-lint run ./graph/...— 0 issuesgo test ./graph/... -count=1— passes (no Java/Python/Go regressions)graph/clike/detection_test.go:TestIsCSourceFile/TestIsCppSourceFile— every extension + cached/uncached.hTestDetectCppInHeader— pure C, class, namespace, template,::,extern "C", empty, missing-filegraph/utils_test.go::TestGetFilesIncludesCAndCpp— verifies inclusion of C/C++ extensions and exclusion of all build/artifact dirs