Skip to content

feat(graph): C/C++ file detection and tree-sitter parsing foundation#668

Merged
shivasurya merged 1 commit intomainfrom
claude/review-techspec-pr-docs-7jawU
May 3, 2026
Merged

feat(graph): C/C++ file detection and tree-sitter parsing foundation#668
shivasurya merged 1 commit intomainfrom
claude/review-techspec-pr-docs-7jawU

Conversation

@shivasurya
Copy link
Copy Markdown
Owner

@shivasurya shivasurya commented May 2, 2026

Summary

Foundation for C/C++ language support — file discovery and tree-sitter
grammar selection only. No AST → Node dispatch yet, so Java/Python/Go
behavior is unchanged.

  • Route .c, .cpp, .cc, .cxx, .h, .hpp, .hh, .hxx files to
    the C and C++ tree-sitter grammars
  • Exclude common C/C++ build artifact directories from getFiles():
    build/, cmake-build-*, third_party/, external/, obj/, bin/,
    dist/, .cache/
  • Disambiguate .h headers as C vs C++ via a best-effort heuristic
    (first ~100 lines scanned for class , namespace , template<,
    access specifiers, ::)
  • Cache header classification in a sync.Map populated once per file in
    the initialize.go worker before AST traversal — keeps the per-node
    IsCSourceFile / IsCppSourceFile lookups zero-I/O

Package layout

C/C++ helpers live in a new graph/clike/ subpackage that sits alongside
the existing language siblings:

sast-engine/graph/
├── clike/         ← NEW: shared C/C++ helpers (detection now, parsers next)
│   ├── detection.go
│   └── detection_test.go
├── docker/
├── golang/
├── java/
└── python/

Subsequent PRs will add graph/clike/parser.go, type-string extraction,
and parameter extraction in the same package, mirroring the layout of
graph/golang/.

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • golangci-lint run ./graph/... — 0 issues
  • go test ./graph/... -count=1 — passes (no Java/Python/Go regressions)
  • New unit tests in graph/clike/detection_test.go:
    • TestIsCSourceFile / TestIsCppSourceFile — every extension + cached/uncached .h
    • TestDetectCppInHeader — pure C, class, namespace, template, ::, extern "C", empty, missing-file
  • Updated graph/utils_test.go::TestGetFilesIncludesCAndCpp — verifies inclusion of C/C++ extensions and exclusion of all build/artifact dirs

@safedep
Copy link
Copy Markdown

safedep Bot commented May 2, 2026

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Code Pathfinder Security Scan

Pass Critical High Medium Low Info

No security issues detected.

Metric Value
Files Scanned 5
Rules 205

Powered by Code Pathfinder

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.07%. Comparing base (99ecf1e) to head (0f553a6).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sast-engine/graph/initialize.go 60.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #668   +/-   ##
=======================================
  Coverage   85.06%   85.07%           
=======================================
  Files         172      173    +1     
  Lines       25027    25070   +43     
=======================================
+ Hits        21290    21329   +39     
- Misses       2941     2944    +3     
- Partials      796      797    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shivasurya shivasurya force-pushed the claude/review-techspec-pr-docs-7jawU branch from 9690595 to caf0fec Compare May 2, 2026 12:59
@shivasurya shivasurya changed the title feat(graph): C/C++ file detection and tree-sitter parsing foundation (PR-01) feat(graph): C/C++ file detection and tree-sitter parsing foundation May 2, 2026
@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code labels May 2, 2026
@shivasurya shivasurya self-assigned this May 2, 2026
Route .c/.cpp/.cc/.cxx/.h/.hpp/.hh/.hxx files to the C and C++ tree-sitter
grammars and exclude common C/C++ build artifact directories
(build/, cmake-build-*, third_party/, external/, obj/, bin/, dist/, .cache/)
during file discovery.

Header files share the .h extension between C and C++. A best-effort
heuristic scans the first 100 lines for C++-only indicators (class,
namespace, template, access specifiers, ::) and the result is cached in a
sync.Map populated once per file in the initialize.go worker before AST
traversal, so the per-AST-node IsCSourceFile / IsCppSourceFile lookups
stay zero-I/O.

The C/C++ detection helpers live in a new graph/clike/ subpackage that
sits alongside the existing graph/golang, graph/python, graph/java, and
graph/docker packages. graph/clike will accumulate the shared C/C++
extraction primitives in subsequent PRs (parser, type strings, parameter
extraction); this PR introduces detection only.

This is parsing-pipeline groundwork only — no AST → Node dispatch yet, so
existing Java/Python/Go behavior is unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
@shivasurya shivasurya force-pushed the claude/review-techspec-pr-docs-7jawU branch from caf0fec to 0f553a6 Compare May 2, 2026 13:06
Copy link
Copy Markdown
Owner Author

shivasurya commented May 3, 2026

Merge activity

  • May 3, 1:15 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 3, 1:16 PM UTC: @shivasurya merged this pull request with Graphite.

@shivasurya shivasurya merged commit 8af6b85 into main May 3, 2026
7 of 8 checks passed
@shivasurya shivasurya deleted the claude/review-techspec-pr-docs-7jawU branch May 3, 2026 13:16
shivasurya added a commit that referenced this pull request May 3, 2026
## Summary

Stacked on **#668** (C/C++ file detection foundation).

Adds the cross-cutting primitives that the C parser (PR-03) and C++ parser
(PR-04) will share. Centralising the extraction logic here prevents two
parallel implementations from drifting apart, and matches the convention
used by `graph/golang/`, `graph/python/`, `graph/java/`, and `graph/docker/`.

## Files

```
sast-engine/graph/clike/
├── doc.go                  ← package documentation
├── detection.go            ← (from PR-01) language detection
├── detection_test.go       ← (from PR-01)
├── declarations.go         ← FunctionInfo, FieldInfo, extractors
├── declarations_test.go
├── types.go                ← ExtractTypeString
├── types_test.go
├── helpers.go              ← params, calls, keyword maps
├── helpers_test.go
└── testhelpers_test.go     ← shared parseC/parseCpp/findNode test utilities
```

## Helpers

- **`ExtractFunctionInfo`** → name, return type, params, `IsDeclaration`
  flag (forward decl vs definition)
- **`ExtractStructFields`** → name+type pairs for structs/classes
- **`ExtractTypeString`** → canonical type string with qualifiers,
  pointer/reference suffixes, templates, qualified names. Examples:
  `char*`, `const std::string&`, `unsigned long long`,
  `std::vector<int>`, `int**`
- **`ExtractParameters`** → parallel `(names, types)` slices, variadics
  rendered as `("...", "...")`
- **`ExtractCallInfo`** → classifies calls as free / method-dot
  (`obj.foo()`) / method-arrow (`ptr->foo()`) / qualified
  (`std::move(x)`); captures receiver and args
- **`IsCKeyword` / `IsCppKeyword`** → C89..C23 + C++ additions; used by
  PR-05's statement extraction to filter reserved words

## Why an `innerDeclarator` helper

Tree-sitter's C grammar exposes `pointer_declarator → declarator (field)`,
but the C++ `reference_declarator` node only has anonymous named children
for `&` and the inner identifier. `innerDeclarator` tries the field-named
child first and falls back to scanning named children, so the same
declarator walker handles `int**`, `char*`, and `const std::string&`
without per-grammar branching.

## Test plan

- [x] `go build ./...` — clean
- [x] `go vet ./...` — clean
- [x] `golangci-lint run ./graph/...` — 0 issues
- [x] `go test ./graph/... -count=1` — all pass (no regressions)
- [x] `TestExtractTypeString` — 11 cases (primitives, pointers, refs,
  qualifiers, templates, qualified names) + nil guard
- [x] `TestExtractFunctionInfo` — C/C++ definitions, void/typed/pointer
  returns, variadics, namespaced methods + nil guard
- [x] `TestExtractStructFields` — populated and empty structs + nil guard
- [x] `TestExtractParameters` — typed, variadic, unnamed, void, C++
  const-ref + nil guard
- [x] `TestExtractCallInfo` — free / dot / arrow / qualified shapes +
  nil and wrong-node guards
- [x] `TestIsCKeyword` / `TestIsCppKeyword` — C89..C23 keywords, C++-only
  additions, non-keywords, and the C/C++ exclusivity boundary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants