Skip to content

feat: add meta mode Phase 3 — classify and contribute upstream#376

Open
Maxusmusti wants to merge 3 commits into
mainfrom
feat/meta-mode-upstream-contributions
Open

feat: add meta mode Phase 3 — classify and contribute upstream#376
Maxusmusti wants to merge 3 commits into
mainfrom
feat/meta-mode-upstream-contributions

Conversation

@Maxusmusti
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a contribution pipeline to meta mode that classifies evolved playbook items as general (upstream-worthy) vs project-specific (local only), then lets users contribute the general ones back as PRs
  • New factory contribute CLI command with --classify, --submit, and --status subcommands
  • CEO prompt updated with Phase 3 (M4/M5/M6) that runs after ACE evolution

Motivation

Currently, meta mode evolves playbooks locally via ACE — all learnings stay in ~/.factory/playbooks/ and never flow back upstream. This means every user independently re-discovers the same improvements. With this change, the factory can identify patterns that are universally useful across diverse projects and contribute them back to the default playbooks, closing the self-improvement loop: the more the factory is used, the better it becomes for everyone.

How it works

Classification engine (factory/ace/contributor.py)

Each evolved playbook item is scored on a general-vs-specific spectrum using four weighted signals:

Signal Weight What it measures
Cross-project prevalence 40% Does this pattern appear across 3+ unrelated projects?
Domain independence 25% Does it reference factory internals or project-specific frameworks?
Evidence strength 20% How many observations (helpful/harmful) support it?
Category signal 15% Is the hypothesis category inherently general (e.g., prompt_engineering) or specific (e.g., feature)?

Items scoring ≥ 0.65 are classified as general, ≤ 0.35 as specific, and between as uncertain.

User experience

At the end of a meta mode run, users see a terminal summary:

════════════════════════════════════════════════════════════
                    META MODE SUMMARY
════════════════════════════════════════════════════════════

PLAYBOOK EVOLUTION COMPLETE
  9 items evolved across 5 roles
  3 general (upstream candidates)  |  3 specific (local only)  |  3 uncertain

────────────────────────────────────────────────────────────
GENERAL IMPROVEMENTS (upstream candidates)
────────────────────────────────────────────────────────────

  1. [strategist] "Always run type checkers after making changes"
     Generality: ████████░░ 0.81  |  5 projects  |  16 experiments
     Category: type_safety

────────────────────────────────────────────────────────────
PROJECT-SPECIFIC IMPROVEMENTS (staying local)
────────────────────────────────────────────────────────────

  1. [builder] "Use iframe wait patterns for Playwright tests"
     Generality: ██░░░░░░░░ 0.22  |  1 project  |  5 experiments
     Why local: single-project signal, domain-specific (Playwright)

════════════════════════════════════════════════════════════
Run `factory contribute` to select items for upstream PR.
════════════════════════════════════════════════════════════

Users can then run factory contribute --submit to create a PR, or skip — contribution is always opt-in.

CLI commands

# Classify evolved items and show summary
factory contribute --classify /path/to/project

# Create PR with all general items
factory contribute --submit /path/to/project --all

# Check pending candidates
factory contribute --status

CEO prompt changes

Phase 3 (steps M4/M5/M6) is added after Phase 2 (ACE). The CEO:

  1. Runs factory contribute --classify to score evolved items
  2. Presents the summary to the user
  3. Waits for explicit approval before submitting — never auto-contributes

Files changed

File Change
factory/ace/contributor.py New — classification engine, contribution pipeline, terminal summary, git/gh submit, JSON persistence (780 lines)
factory/cli.py Modifiedfactory contribute command with --classify/--submit/--status subcommands
factory/agents/prompts/ceo.md Modified — Phase 3 (M4/M5/M6) + task table entry
tests/test_contributor.py New — 26 tests covering classification, diffing, summary, PR body, persistence

Design decisions

  • Composition over inheritance for ClassifiedItem wrapping PlaybookItem (since PlaybookItem has extra="forbid")
  • Reuses existing factory infrastructure: Playbook.from_markdown(), classify_hypothesis(), discover_projects(), load_all_histories(), DEFAULTS_DIR, user_playbooks_dir()
  • prepare_contribution() returns specs without executing git — keeps the module testable; execute_contribution() handles the actual git/gh commands separately
  • Fuzzy matching (SequenceMatcher ≥ 0.75) for both cross-project evidence and playbook diffing, consistent with the existing reflector

Test plan

  • 26 new tests pass (pytest tests/test_contributor.py)
  • 261 existing tests pass — zero regressions
  • factory contribute --help shows correct usage
  • All imports resolve correctly
  • Manual test: run factory contribute --classify on a project with evolved playbooks
  • Manual test: run factory contribute --submit --dry-run to verify PR spec generation

🤖 Generated with Claude Code

Add the ability for meta mode to distinguish general improvements from
project-specific ones, and contribute the general items back upstream as
PRs. This closes the self-improvement loop: the more the factory is used
across diverse projects, the better its default playbooks become.

New CLI command `factory contribute` with three modes:
- `--classify`: scores evolved playbook items on a general-vs-specific
  spectrum using four weighted signals (cross-project prevalence 40%,
  domain independence 25%, evidence strength 20%, category signal 15%)
- `--submit`: creates a PR against the factory repo with approved items
- `--status`: shows pending contribution candidates

The classification engine uses cross-project experiment data to identify
items that appear across 3+ unrelated projects as "general" (upstream
candidates), single-project items as "specific" (local only), and
everything in between as "uncertain" (needs more data).

The CEO prompt is updated with Phase 3 (M4/M5/M6) which runs after ACE
evolution, presents the user with a terminal summary showing the
distinction, and lets them opt in to contributing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 93.70277% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.56%. Comparing base (190741e) to head (290672d).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
factory/cli.py 65.51% 20 Missing ⚠️
factory/ace/contributor.py 98.52% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #376      +/-   ##
==========================================
+ Coverage   87.54%   87.56%   +0.02%     
==========================================
  Files          60       62       +2     
  Lines        9170     9734     +564     
==========================================
+ Hits         8028     8524     +496     
- Misses       1142     1210      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Maxusmusti and others added 2 commits May 25, 2026 18:38
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 26 tests covering uncovered paths: classify_evolved_playbooks
pipeline, package_evidence, prepare_contribution, execute_contribution
(mocked subprocess), explain_specificity/uncertainty branches,
load_candidates edge cases, and cmd_contribute CLI handler.

Fix lint: remove unused imports, rename ambiguous variable, drop unused
locals.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant