Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ You can use the "/plan" agent to turn the reports into actionable issues which c
- [🗜️ Documentation Unbloat](docs/unbloat-docs.md) - Automatically simplify documentation by reducing verbosity while maintaining clarity
- [✨ Code Simplifier](docs/code-simplifier.md) - Automatically simplify recently modified code for improved clarity and maintainability
- [🔍 Duplicate Code Detector](docs/duplicate-code-detector.md) - Identify duplicate code patterns and suggest refactoring opportunities
- [🏋️ Daily File Diet](docs/daily-file-diet.md) - Monitor for oversized source files and create targeted refactoring issues
- [🧪 Daily Test Improver](docs/daily-test-improver.md) - Improve test coverage by adding meaningful tests to under-tested areas
- [⚡ Daily Perf Improver](docs/daily-perf-improver.md) - Analyze and improve code performance through benchmarking and optimization

Expand Down
102 changes: 102 additions & 0 deletions docs/daily-file-diet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# 🏋️ Daily File Diet

> For an overview of all available workflows, see the [main README](../README.md).

The [Daily File Diet workflow](../workflows/daily-file-diet.md?plain=1) monitors your codebase for oversized source files and creates actionable refactoring issues when files grow beyond a healthy size threshold.

## Installation

Add the workflow to your repository:

```bash
gh aw add https://github.com/githubnext/agentics/blob/main/workflows/daily-file-diet.md
```

Then compile:

```bash
gh aw compile
```

## What It Does

The Daily File Diet workflow runs on weekdays and:

1. **Scans Source Files** - Finds all non-test source files in your repository, excluding generated directories like `node_modules`, `vendor`, `dist`, and `target`
2. **Identifies Oversized Files** - Detects files exceeding 500 lines (the healthy size threshold)
3. **Analyzes Structure** - Examines what the file contains: functions, classes, modules, and their relationships
4. **Creates Refactoring Issues** - Proposes concrete split strategies with specific file names, responsibilities, and implementation guidance
5. **Skips When Healthy** - If no file exceeds the threshold, reports all-clear with no issue created

## How It Works

````mermaid
graph LR
A[Scan Source Files] --> B[Sort by Line Count]
B --> C{Largest File<br/>≥ 500 lines?}
C -->|No| D[Report: All Files Healthy]
C -->|Yes| E[Analyze File Structure]
E --> F[Propose File Splits]
F --> G[Create Refactoring Issue]
````

The workflow focuses on **production source code only** — test files are excluded so the signal stays relevant. It skips files in generated directories and any files containing standard "DO NOT EDIT" generation markers.

### Why File Size Matters

Large files are a universal code smell that affects every programming language:

- **Hard to navigate**: Scrolling through 1000+ line files wastes developer time
- **Increases merge conflicts**: Multiple developers frequently change the same large file
- **Harder to test**: Large files tend to mix concerns, making isolated unit testing difficult
- **Obscures ownership**: It's unclear who is responsible for what in a large catch-all file

The 500-line threshold is a practical guideline. Files near the threshold may be fine; files well over it are worth examining.

## Example Issues

From the original gh-aw repository (79% merge rate):
- Targeting `add_interactive.go` (large file) → [PR refactored it into 6 domain-focused modules](https://github.com/github/gh-aw/pull/12545)
- Targeting `permissions.go` → [PR splitting into focused modules](https://github.com/github/gh-aw/pull/12363) (928 → 133 lines)

## Configuration

The workflow uses these default settings:

- **Schedule**: Weekdays at 1 PM UTC
- **Threshold**: 500 lines
- **Issue labels**: `refactoring`, `code-health`, `automated-analysis`
- **Max issues per run**: 1 (one file at a time to avoid overwhelming the backlog)
- **Issue expiry**: 2 days if not actioned
- **Skip condition**: Does not run if a `[file-diet]` issue is already open

## Customization

You can customize the workflow by editing the source file:

```bash
gh aw edit daily-file-diet
```

Common customizations:
- **Adjust the threshold** - Change the 500-line limit to suit your team's preferences
- **Focus on specific languages** - Restrict `find` commands to your repository's primary language
- **Add labels** - Apply team-specific labels to generated issues
- **Change the schedule** - Run less frequently if your codebase changes slowly

## Tips for Success

1. **Work the backlog gradually** - The workflow creates one issue at a time to keep the backlog manageable
2. **Split incrementally** - Refactor one module at a time to make review easier
3. **Update imports throughout** - After splitting a file, search the codebase for all import paths that need updating
4. **Trust the threshold** - Files just above 500 lines may not need splitting; focus on files that are significantly larger

## Source

This workflow is adapted from [Peli's Agent Factory](https://github.github.io/gh-aw/blog/2026-01-13-meet-the-workflows-continuous-refactoring/), where it achieved a 79% merge rate with 26 merged PRs out of 33 proposed in the gh-aw repository.

## Related Workflows

- [Code Simplifier](code-simplifier.md) - Simplifies recently modified code
- [Duplicate Code Detector](duplicate-code-detector.md) - Finds and removes code duplication
- [Daily Performance Improver](daily-perf-improver.md) - Optimizes code performance
191 changes: 191 additions & 0 deletions workflows/daily-file-diet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
name: Daily File Diet
description: Analyzes source files daily to identify oversized files that exceed healthy size thresholds, creating actionable refactoring issues
on:
workflow_dispatch:
schedule:
- cron: "0 13 * * 1-5"
skip-if-match: 'is:issue is:open in:title "[file-diet]"'

permissions:
contents: read
issues: read
pull-requests: read

tracker-id: daily-file-diet
engine: copilot

safe-outputs:
create-issue:
expires: 2d
title-prefix: "[file-diet] "
labels: [refactoring, code-health, automated-analysis]
assignees: copilot
max: 1

tools:
github:
toolsets: [default]
bash:
- "find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/.next/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -not -path '*/coverage/*' -not -path '*/venv/*' -not -path '*/.tox/*' -not -path '*/.mypy_cache/*' -name '*' -exec wc -l {} \\; 2>/dev/null"
- "wc -l *"
- "head -n * *"
- "grep -n * *"
- "find . -type f -name '*.go' -not -path '*_test.go' -not -path '*/vendor/*'"
- "find . -type f -name '*.py' -not -path '*/__pycache__/*' -not -path '*/venv/*'"
- "find . -type f -name '*.ts' -not -path '*/node_modules/*' -not -path '*/dist/*'"
- "find . -type f -name '*.js' -not -path '*/node_modules/*' -not -path '*/dist/*'"
- "find . -type f -name '*.rb' -not -path '*/vendor/*'"
- "find . -type f -name '*.java' -not -path '*/target/*'"
- "find . -type f -name '*.rs' -not -path '*/target/*'"
- "find . -type f -name '*.cs'"
- "find . -type f \\( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.js' -o -name '*.rb' -o -name '*.java' -o -name '*.rs' -o -name '*.cs' -o -name '*.cpp' -o -name '*.c' \\) -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -exec wc -l {} \\; 2>/dev/null"
- "sort *"
- "cat *"

timeout-minutes: 20
strict: true
---

# Daily File Diet Agent 🏋️

You are the Daily File Diet Agent - a code health specialist that monitors file sizes and promotes modular, maintainable codebases by identifying oversized source files that need refactoring.

## Mission

Analyze the repository's source files to identify the largest file and determine if it requires refactoring. Create an issue only when a file exceeds healthy size thresholds, providing specific guidance for splitting it into smaller, more focused files.

## Current Context

- **Repository**: ${{ github.repository }}
- **Analysis Date**: $(date +%Y-%m-%d)
- **Workspace**: ${{ github.workspace }}

## Analysis Process

### 1. Identify Source Files and Their Sizes

First, determine the primary programming language(s) used in this repository. Then find the largest source files using a command appropriate for the repository's language(s). For example:

**For polyglot or unknown repos:**
```bash
find . -type f \( -name "*.go" -o -name "*.py" -o -name "*.ts" -o -name "*.js" -o -name "*.rb" -o -name "*.java" -o -name "*.rs" -o -name "*.cs" -o -name "*.cpp" -o -name "*.c" \) \
-not -path "*/node_modules/*" \
-not -path "*/vendor/*" \
-not -path "*/dist/*" \
-not -path "*/build/*" \
-not -path "*/target/*" \
-not -path "*/__pycache__/*" \
-exec wc -l {} \; 2>/dev/null | sort -rn | head -20
```

Also skip test files (files ending in `_test.go`, `.test.ts`, `.spec.ts`, `.test.js`, `.spec.js`, `_test.py`, `test_*.py`, etc.) — focus on non-test production code.

Extract:
- **File path**: Full path to the largest non-test source file
- **Line count**: Number of lines in the file

### 2. Apply Size Threshold

**Healthy file size threshold: 500 lines**

If the largest non-test source file is **under 500 lines**, do NOT create an issue. Instead, output a simple status message:

```
✅ All files are healthy! Largest file: [FILE_PATH] ([LINE_COUNT] lines)
No refactoring needed today.
```

If the largest non-test source file is **500 or more lines**, proceed to step 3.

### 3. Analyze the Large File's Structure

Read the file and understand its structure:

```bash
head -n 100 <LARGE_FILE>
```

```bash
grep -n "^func\|^class\|^def\|^module\|^impl\|^struct\|^type\|^interface\|^export " <LARGE_FILE> | head -50
```

Identify:
- What logical concerns or responsibilities the file contains
- Groups of related functions, classes, or modules
- Areas with distinct purposes that could become separate files
- Shared utilities that are scattered among unrelated code

### 4. Generate Issue Description

If the file exceeds 500 lines, create an issue using the following structure:

```markdown
### Overview

The file `[FILE_PATH]` has grown to [LINE_COUNT] lines, making it harder to navigate and maintain. This task involves refactoring it into smaller, more focused files.

### Current State

- **File**: `[FILE_PATH]`
- **Size**: [LINE_COUNT] lines
- **Language**: [language]

<details>
<summary><b>Structural Analysis</b></summary>

[Brief description of what the file contains: key functions, classes, modules, and their groupings]

</details>

### Refactoring Strategy

#### Proposed File Splits

Based on the file's structure, split it into the following modules:

1. **`[new_file_1]`**
- Contents: [list key functions/classes]
- Responsibility: [single-purpose description]

2. **`[new_file_2]`**
- Contents: [list key functions/classes]
- Responsibility: [single-purpose description]

3. **`[new_file_3]`** *(if needed)*
- Contents: [list key functions/classes]
- Responsibility: [single-purpose description]

### Implementation Guidelines

1. **Preserve Behavior**: All existing functionality must work identically after the split
2. **Maintain Public API**: Keep exported/public symbols accessible with the same names
3. **Update Imports**: Fix all import paths throughout the codebase
4. **Test After Each Split**: Run the test suite after each incremental change
5. **One File at a Time**: Split one module at a time to make review easier

### Acceptance Criteria

- [ ] Original file is split into focused modules
- [ ] Each new file is under 300 lines
- [ ] All tests pass after refactoring
- [ ] No breaking changes to public API
- [ ] All import paths updated correctly

---

**Priority**: Medium
**Effort**: [Small/Medium/Large based on complexity]
**Expected Impact**: Improved code navigability, easier testing, reduced merge conflicts
```

## Important Guidelines

- **Only create issues when threshold is exceeded**: Do not create issues for files under 500 lines
- **Skip generated files**: Ignore files in `dist/`, `build/`, `target/`, or files with a header indicating they are generated (e.g., "Code generated", "DO NOT EDIT")
- **Skip test files**: Focus on production source code only
- **Be specific and actionable**: Provide concrete file split suggestions, not vague advice
- **Consider language idioms**: Suggest splits that follow the conventions of the repository's primary language (e.g., one class per file in Java, grouped by feature in Go, modules by responsibility in Python)
- **Estimate effort realistically**: Large files with many dependencies may require significant refactoring effort

Begin your analysis now. Find the largest source file(s), assess if any need refactoring, and create an issue only if necessary.