You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Big thanks for building such a constructive project.
It’s really helped me develop ideas on how to preprocess projects. To produce documentation that’s effective for my own work, I’ve been adding several enhancements in my fork to support integration with my personal projects.
Right now, I’ve set up three feature branches:
My next planned improvement is full-project analysis with adaptive batch processing: introducing dynamic limit calculation and resumable workflows to better handle large-scale repositories. e2720pjk#26
Context
These changes were generated with the help of an LLM, so I understand if there are concerns about quality. I’ll continue refining and maintaining them in my fork, and I’d be glad to contribute upstream if they’re considered useful.
Questions for the maintainers
Would you be open to reviewing PRs for any of these features?
I noticed that the tests/ directory is explicitly listed in .gitignore, which conflicts with the configuration in pyproject.toml. Does this mean tests are not intended to be included in the repository? Should pytest files be excluded from PRs?
Respect .gitignore patterns during analysis (v0.1.1)
--max-files
int
100
Maximum number of files to analyze (range: 1-5000) (v0.1.1)
--max-entry-points
int
5
Maximum number of entry points to identify (v0.1.1)
--max-connectivity-files
int
10
Maximum number of high-connectivity files (v0.1.1)
CLI Command Options (config set)
Option
Type
Default
Range
Description
--enable-parallel-processing
flag
True
-
Enable parallel processing for leaf modules (v0.1.1)
--disable-parallel-processing
flag
-
-
Disable parallel processing (v0.1.1)
--concurrency-limit
int
5
1-10
Maximum concurrent API calls (v0.1.1)
--max-tokens-per-module
int
36369
1000-200000
Maximum tokens per module (v0.1.1)
--max-tokens-per-leaf
int
16000
500-100000
Maximum tokens per leaf module (v0.1.1)
--cache-size
int
1000
100-10000
LLM cache size - number of cached prompts (v0.1.1)
Joern CPG Support (POC)
Joern: https://github.com/joernio/joern
Replacing LLM-dependent clustering and flow analysis with Joern Code Property Graphs for more deterministic control flow and dependency extraction. Initial POC is functional; looking for feedback on approach.
A/B Testing (POC)
Version comparison scripts have been implemented and are functional. However, the evaluation metrics still need refinement, as current reports show anomalies. At this stage, the implementation serves mainly as a showcase of evaluation methods, not a core feature.
Reference reports available at e2720pjk#9 (comment) .
Hi team
Big thanks for building such a constructive project.
It’s really helped me develop ideas on how to preprocess projects. To produce documentation that’s effective for my own work, I’ve been adding several enhancements in my fork to support integration with my personal projects.
Right now, I’ve set up three feature branches:
feat: Major performance improvements and architectural enhancements (v0.1.1 - v0.1.6) e2720pjk/CodeWiki#27
PR: Implemented hybrid AST + Joern CPG analysis e2720pjk/CodeWiki#12
CodeWiki A/B testing framework ready e2720pjk/CodeWiki#10
Roadmap
My next planned improvement is full-project analysis with adaptive batch processing: introducing dynamic limit calculation and resumable workflows to better handle large-scale repositories.
e2720pjk#26
Context
These changes were generated with the help of an LLM, so I understand if there are concerns about quality. I’ll continue refining and maintaining them in my fork, and I’d be glad to contribute upstream if they’re considered useful.
Questions for the maintainers
tests/directory is explicitly listed in.gitignore, which conflicts with the configuration inpyproject.toml. Does this mean tests are not intended to be included in the repository? Should pytest files be excluded from PRs?Branch Summary
Feature Enhancements(stable)
Quick Reference: New CLI Options & Configuration Parameters (v0.1.1 - v0.1.6)
CLI Command Options (generate)
--respect-gitignoreFalse.gitignorepatterns during analysis (v0.1.1)--max-files100--max-entry-points5--max-connectivity-files10CLI Command Options (config set)
--enable-parallel-processingTrue--disable-parallel-processing--concurrency-limit5--max-tokens-per-module36369--max-tokens-per-leaf16000--cache-size1000Joern CPG Support (POC)
Joern: https://github.com/joernio/joern
Replacing LLM-dependent clustering and flow analysis with Joern Code Property Graphs for more deterministic control flow and dependency extraction. Initial POC is functional; looking for feedback on approach.
A/B Testing (POC)
Version comparison scripts have been implemented and are functional. However, the evaluation metrics still need refinement, as current reports show anomalies. At this stage, the implementation serves mainly as a showcase of evaluation methods, not a core feature.
Reference reports available at e2720pjk#9 (comment) .