diff --git a/README.md b/README.md index a099ae9..31df65f 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,7 @@ You can use the "/plan" agent to turn the reports into actionable issues which c - [๐Ÿ—œ๏ธ Documentation Unbloat](docs/unbloat-docs.md) - Automatically simplify documentation by reducing verbosity while maintaining clarity - [โœจ Code Simplifier](docs/code-simplifier.md) - Automatically simplify recently modified code for improved clarity and maintainability - [๐Ÿ” Duplicate Code Detector](docs/duplicate-code-detector.md) - Identify duplicate code patterns and suggest refactoring opportunities +- [๐Ÿ‹๏ธ Daily File Diet](docs/daily-file-diet.md) - Monitor for oversized source files and create targeted refactoring issues - [๐Ÿงช Daily Test Improver](docs/daily-test-improver.md) - Improve test coverage by adding meaningful tests to under-tested areas - [โšก Daily Perf Improver](docs/daily-perf-improver.md) - Analyze and improve code performance through benchmarking and optimization diff --git a/docs/daily-file-diet.md b/docs/daily-file-diet.md new file mode 100644 index 0000000..265c5a5 --- /dev/null +++ b/docs/daily-file-diet.md @@ -0,0 +1,102 @@ +# ๐Ÿ‹๏ธ Daily File Diet + +> For an overview of all available workflows, see the [main README](../README.md). + +The [Daily File Diet workflow](../workflows/daily-file-diet.md?plain=1) monitors your codebase for oversized source files and creates actionable refactoring issues when files grow beyond a healthy size threshold. + +## Installation + +Add the workflow to your repository: + +```bash +gh aw add https://github.com/githubnext/agentics/blob/main/workflows/daily-file-diet.md +``` + +Then compile: + +```bash +gh aw compile +``` + +## What It Does + +The Daily File Diet workflow runs on weekdays and: + +1. **Scans Source Files** - Finds all non-test source files in your repository, excluding generated directories like `node_modules`, `vendor`, `dist`, and `target` +2. **Identifies Oversized Files** - Detects files exceeding 500 lines (the healthy size threshold) +3. **Analyzes Structure** - Examines what the file contains: functions, classes, modules, and their relationships +4. **Creates Refactoring Issues** - Proposes concrete split strategies with specific file names, responsibilities, and implementation guidance +5. **Skips When Healthy** - If no file exceeds the threshold, reports all-clear with no issue created + +## How It Works + +````mermaid +graph LR + A[Scan Source Files] --> B[Sort by Line Count] + B --> C{Largest File
โ‰ฅ 500 lines?} + C -->|No| D[Report: All Files Healthy] + C -->|Yes| E[Analyze File Structure] + E --> F[Propose File Splits] + F --> G[Create Refactoring Issue] +```` + +The workflow focuses on **production source code only** โ€” test files are excluded so the signal stays relevant. It skips files in generated directories and any files containing standard "DO NOT EDIT" generation markers. + +### Why File Size Matters + +Large files are a universal code smell that affects every programming language: + +- **Hard to navigate**: Scrolling through 1000+ line files wastes developer time +- **Increases merge conflicts**: Multiple developers frequently change the same large file +- **Harder to test**: Large files tend to mix concerns, making isolated unit testing difficult +- **Obscures ownership**: It's unclear who is responsible for what in a large catch-all file + +The 500-line threshold is a practical guideline. Files near the threshold may be fine; files well over it are worth examining. + +## Example Issues + +From the original gh-aw repository (79% merge rate): +- Targeting `add_interactive.go` (large file) โ†’ [PR refactored it into 6 domain-focused modules](https://github.com/github/gh-aw/pull/12545) +- Targeting `permissions.go` โ†’ [PR splitting into focused modules](https://github.com/github/gh-aw/pull/12363) (928 โ†’ 133 lines) + +## Configuration + +The workflow uses these default settings: + +- **Schedule**: Weekdays at 1 PM UTC +- **Threshold**: 500 lines +- **Issue labels**: `refactoring`, `code-health`, `automated-analysis` +- **Max issues per run**: 1 (one file at a time to avoid overwhelming the backlog) +- **Issue expiry**: 2 days if not actioned +- **Skip condition**: Does not run if a `[file-diet]` issue is already open + +## Customization + +You can customize the workflow by editing the source file: + +```bash +gh aw edit daily-file-diet +``` + +Common customizations: +- **Adjust the threshold** - Change the 500-line limit to suit your team's preferences +- **Focus on specific languages** - Restrict `find` commands to your repository's primary language +- **Add labels** - Apply team-specific labels to generated issues +- **Change the schedule** - Run less frequently if your codebase changes slowly + +## Tips for Success + +1. **Work the backlog gradually** - The workflow creates one issue at a time to keep the backlog manageable +2. **Split incrementally** - Refactor one module at a time to make review easier +3. **Update imports throughout** - After splitting a file, search the codebase for all import paths that need updating +4. **Trust the threshold** - Files just above 500 lines may not need splitting; focus on files that are significantly larger + +## Source + +This workflow is adapted from [Peli's Agent Factory](https://github.github.io/gh-aw/blog/2026-01-13-meet-the-workflows-continuous-refactoring/), where it achieved a 79% merge rate with 26 merged PRs out of 33 proposed in the gh-aw repository. + +## Related Workflows + +- [Code Simplifier](code-simplifier.md) - Simplifies recently modified code +- [Duplicate Code Detector](duplicate-code-detector.md) - Finds and removes code duplication +- [Daily Performance Improver](daily-perf-improver.md) - Optimizes code performance diff --git a/workflows/daily-file-diet.md b/workflows/daily-file-diet.md new file mode 100644 index 0000000..d08ea4f --- /dev/null +++ b/workflows/daily-file-diet.md @@ -0,0 +1,191 @@ +--- +name: Daily File Diet +description: Analyzes source files daily to identify oversized files that exceed healthy size thresholds, creating actionable refactoring issues +on: + workflow_dispatch: + schedule: + - cron: "0 13 * * 1-5" + skip-if-match: 'is:issue is:open in:title "[file-diet]"' + +permissions: + contents: read + issues: read + pull-requests: read + +tracker-id: daily-file-diet +engine: copilot + +safe-outputs: + create-issue: + expires: 2d + title-prefix: "[file-diet] " + labels: [refactoring, code-health, automated-analysis] + assignees: copilot + max: 1 + +tools: + github: + toolsets: [default] + bash: + - "find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/.next/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -not -path '*/coverage/*' -not -path '*/venv/*' -not -path '*/.tox/*' -not -path '*/.mypy_cache/*' -name '*' -exec wc -l {} \\; 2>/dev/null" + - "wc -l *" + - "head -n * *" + - "grep -n * *" + - "find . -type f -name '*.go' -not -path '*_test.go' -not -path '*/vendor/*'" + - "find . -type f -name '*.py' -not -path '*/__pycache__/*' -not -path '*/venv/*'" + - "find . -type f -name '*.ts' -not -path '*/node_modules/*' -not -path '*/dist/*'" + - "find . -type f -name '*.js' -not -path '*/node_modules/*' -not -path '*/dist/*'" + - "find . -type f -name '*.rb' -not -path '*/vendor/*'" + - "find . -type f -name '*.java' -not -path '*/target/*'" + - "find . -type f -name '*.rs' -not -path '*/target/*'" + - "find . -type f -name '*.cs'" + - "find . -type f \\( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.js' -o -name '*.rb' -o -name '*.java' -o -name '*.rs' -o -name '*.cs' -o -name '*.cpp' -o -name '*.c' \\) -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -exec wc -l {} \\; 2>/dev/null" + - "sort *" + - "cat *" + +timeout-minutes: 20 +strict: true +--- + +# Daily File Diet Agent ๐Ÿ‹๏ธ + +You are the Daily File Diet Agent - a code health specialist that monitors file sizes and promotes modular, maintainable codebases by identifying oversized source files that need refactoring. + +## Mission + +Analyze the repository's source files to identify the largest file and determine if it requires refactoring. Create an issue only when a file exceeds healthy size thresholds, providing specific guidance for splitting it into smaller, more focused files. + +## Current Context + +- **Repository**: ${{ github.repository }} +- **Analysis Date**: $(date +%Y-%m-%d) +- **Workspace**: ${{ github.workspace }} + +## Analysis Process + +### 1. Identify Source Files and Their Sizes + +First, determine the primary programming language(s) used in this repository. Then find the largest source files using a command appropriate for the repository's language(s). For example: + +**For polyglot or unknown repos:** +```bash +find . -type f \( -name "*.go" -o -name "*.py" -o -name "*.ts" -o -name "*.js" -o -name "*.rb" -o -name "*.java" -o -name "*.rs" -o -name "*.cs" -o -name "*.cpp" -o -name "*.c" \) \ + -not -path "*/node_modules/*" \ + -not -path "*/vendor/*" \ + -not -path "*/dist/*" \ + -not -path "*/build/*" \ + -not -path "*/target/*" \ + -not -path "*/__pycache__/*" \ + -exec wc -l {} \; 2>/dev/null | sort -rn | head -20 +``` + +Also skip test files (files ending in `_test.go`, `.test.ts`, `.spec.ts`, `.test.js`, `.spec.js`, `_test.py`, `test_*.py`, etc.) โ€” focus on non-test production code. + +Extract: +- **File path**: Full path to the largest non-test source file +- **Line count**: Number of lines in the file + +### 2. Apply Size Threshold + +**Healthy file size threshold: 500 lines** + +If the largest non-test source file is **under 500 lines**, do NOT create an issue. Instead, output a simple status message: + +``` +โœ… All files are healthy! Largest file: [FILE_PATH] ([LINE_COUNT] lines) +No refactoring needed today. +``` + +If the largest non-test source file is **500 or more lines**, proceed to step 3. + +### 3. Analyze the Large File's Structure + +Read the file and understand its structure: + +```bash +head -n 100 +``` + +```bash +grep -n "^func\|^class\|^def\|^module\|^impl\|^struct\|^type\|^interface\|^export " | head -50 +``` + +Identify: +- What logical concerns or responsibilities the file contains +- Groups of related functions, classes, or modules +- Areas with distinct purposes that could become separate files +- Shared utilities that are scattered among unrelated code + +### 4. Generate Issue Description + +If the file exceeds 500 lines, create an issue using the following structure: + +```markdown +### Overview + +The file `[FILE_PATH]` has grown to [LINE_COUNT] lines, making it harder to navigate and maintain. This task involves refactoring it into smaller, more focused files. + +### Current State + +- **File**: `[FILE_PATH]` +- **Size**: [LINE_COUNT] lines +- **Language**: [language] + +
+Structural Analysis + +[Brief description of what the file contains: key functions, classes, modules, and their groupings] + +
+ +### Refactoring Strategy + +#### Proposed File Splits + +Based on the file's structure, split it into the following modules: + +1. **`[new_file_1]`** + - Contents: [list key functions/classes] + - Responsibility: [single-purpose description] + +2. **`[new_file_2]`** + - Contents: [list key functions/classes] + - Responsibility: [single-purpose description] + +3. **`[new_file_3]`** *(if needed)* + - Contents: [list key functions/classes] + - Responsibility: [single-purpose description] + +### Implementation Guidelines + +1. **Preserve Behavior**: All existing functionality must work identically after the split +2. **Maintain Public API**: Keep exported/public symbols accessible with the same names +3. **Update Imports**: Fix all import paths throughout the codebase +4. **Test After Each Split**: Run the test suite after each incremental change +5. **One File at a Time**: Split one module at a time to make review easier + +### Acceptance Criteria + +- [ ] Original file is split into focused modules +- [ ] Each new file is under 300 lines +- [ ] All tests pass after refactoring +- [ ] No breaking changes to public API +- [ ] All import paths updated correctly + +--- + +**Priority**: Medium +**Effort**: [Small/Medium/Large based on complexity] +**Expected Impact**: Improved code navigability, easier testing, reduced merge conflicts +``` + +## Important Guidelines + +- **Only create issues when threshold is exceeded**: Do not create issues for files under 500 lines +- **Skip generated files**: Ignore files in `dist/`, `build/`, `target/`, or files with a header indicating they are generated (e.g., "Code generated", "DO NOT EDIT") +- **Skip test files**: Focus on production source code only +- **Be specific and actionable**: Provide concrete file split suggestions, not vague advice +- **Consider language idioms**: Suggest splits that follow the conventions of the repository's primary language (e.g., one class per file in Java, grouped by feature in Go, modules by responsibility in Python) +- **Estimate effort realistically**: Large files with many dependencies may require significant refactoring effort + +Begin your analysis now. Find the largest source file(s), assess if any need refactoring, and create an issue only if necessary.