self-evolving-agent

self-evolving-agent

🧠 self-improving-agent only log mistakes.

self-evolving-agent is an OpenClaw-first skill that turns passive self-improvement into a full capability evolution loop: diagnose gaps, set learning priorities, generate training units, evaluate progress, verify transfer, and only then promote durable strategies.

It preserves the best parts of self-improving-agent, but upgrades the paradigm from:

incident logging -> capability evolution
passive memory -> active learning agenda
correction archive -> curriculum + evaluation + promotion gate

✨ Why It Exists

Traditional self-improving agents often stop at:

"something failed"
"log the fix"
"write a rule"

That helps reduce repeated mistakes, but it does not answer the harder questions:

What can the agent reliably do today?
Which capability is actually weak?
What should it practice next?
Has it truly learned, or only recorded?
Can the strategy transfer to a different task?

self-evolving-agent is built to answer those questions explicitly.

📊 self-evolving-agent vs self-improving-agent

Dimension	`self-improving-agent`	`self-evolving-agent`
Primary mode	Reactive correction	Goal-driven capability evolution
Core unit	Incident, error, note	Capability, training unit, evaluation state
Memory model	Learnings and recurring issues	Learnings + capability map + learning agenda
Before-task behavior	Review past notes if relevant	Review notes, capability risks, and active training priorities
After-task behavior	Log errors and lessons	Diagnose weakest capability, update map, revise agenda, create training if needed
Recurrence handling	Detect recurring patterns	Convert recurrence into curriculum with pass criteria
Learning states	Mostly implicit	`recorded -> understood -> practiced -> passed -> generalized -> promoted`
Promotion rule	Promote useful rules	Promote only validated, transferable strategies
Transfer awareness	Limited	Explicit transfer check before promotion
What it optimizes for	Fewer repeated mistakes	More independence, stability, transfer, and unfamiliar-task competence

🚀 What Makes This Different

🧭 Learning agenda: keeps only 1-3 high-leverage capabilities active at a time
🗺️ Capability map: tracks level, evidence, limits, failure modes, and upgrade conditions
🔬 Diagnosis layer: turns incidents into capability-level root-cause analysis
🏋️ Curriculum layer: generates drills, pass criteria, and transfer scenarios
✅ Evaluation ladder: separates writing something down from actually learning it
🔒 Promotion gate: prevents brittle one-off rules from polluting long-term behavior
🤝 Memory retention: still preserves classic logging for errors, learnings, and feature requests

🧱 Architecture

flowchart TD
    A["Task Starts"] --> B["Retrieve Memory"]
    B --> C["Pre-Task Risk Diagnosis"]
    C --> D["Choose Execution Strategy"]
    D --> E["Perform Task"]
    E --> F["Post-Task Reflection"]
    F --> G["Capability Update"]
    G --> H["Training Decision"]
    H --> I["Evaluation State Update"]
    I --> J["Promotion Decision"]

    K["Learning Agenda Review"] --> B
    K --> G
    H --> K
    I --> K

🔁 Closed Loop

For every meaningful cycle, the skill runs this loop:

Classify the task
Retrieve relevant learnings and capabilities
Run a pre-task risk diagnosis
Choose an execution strategy
Perform the task
Reflect after completion
Update the capability map
Generate or revise training
Evaluate learning progress
Promote only validated strategies

Outside the task loop, it also runs a learning agenda review when priorities should change.

🧩 What It Keeps From self-improving-agent

Error logging
Learning capture
Feature request logging
Recurring pattern detection
Review of past learnings before major work
Promotion into durable workspace context
Hook-friendly operation

Those strengths remain, but only as the memory layer, not the whole system.

🔄 Migration From self-improving-agent

The most common conflict is not data loss. It is double activation.

If a user already has self-improving-agent, the safe migration path is:

Install self-evolving-agent without deleting the old skill.
Bootstrap .evolution/ and import the old .learnings/ directory.
Keep the imported logs in .evolution/legacy-self-improving/ as read-only history.
Disable the old self-improvement hook after verifying the import.
Gradually normalize only the legacy items that become active evidence for diagnosis, agenda review, evaluation, or promotion.

This keeps prior experience intact without forcing a lossy one-shot conversion into the new schema.

Example:

~/.openclaw/skills/self-evo-agent/scripts/bootstrap-workspace.sh \
  ~/.openclaw/workspace/.evolution \
  --migrate-from ~/.openclaw/workspace/.learnings
openclaw hooks disable self-improvement
openclaw hooks enable self-evolving-agent

🎯 Best Fit

Use this skill when you want an agent that should:

improve across sessions
become safer on unfamiliar work
convert repeated failures into deliberate practice
distinguish recording from mastery
prove transfer before promotion

⚖️ Light Loop vs Full Loop

The full capability-evolution pipeline is intentionally not the default for every tiny mistake.

Use the light loop when the task is familiar, low-consequence, short-horizon, and no deeper weakness appeared. In that mode, retrieve only the top few relevant memories, state one risk and one verification check, do the work, and log only unusually reusable lessons.

Escalate into the full loop when the task is mixed or unfamiliar, consequence matters, an active agenda item is involved, a failure pattern repeats, the user had to rescue the task, transfer failed, or the lesson may deserve training, evaluation, or promotion.

📁 Repository Layout

self-evolving-agent/
├── SKILL.md
├── README.md
├── README.zh-CN.md
├── install.md
├── agents/
│   └── openai.yaml
├── benchmarks/
│   ├── suite.json
│   └── schemas/
│       └── judge-output.schema.json
├── system/
│   └── coordinator.md
├── modules/
│   ├── capability-map.md
│   ├── curriculum.md
│   ├── diagnose.md
│   ├── evaluator.md
│   ├── learning-agenda.md
│   ├── promotion.md
│   └── reflection.md
├── assets/
│   ├── CAPABILITIES.md
│   ├── ERRORS.md
│   ├── EVALUATIONS.md
│   ├── FEATURE_REQUESTS.md
│   ├── LEARNING_AGENDA.md
│   ├── LEARNINGS.md
│   └── TRAINING_UNITS.md
├── evals/
│   └── evals.json
├── demos/
│   ├── demo-1-diagnosis.md
│   ├── demo-2-training-loop.md
│   ├── demo-3-promotion-and-transfer.md
│   ├── demo-4-agenda-review.md
│   └── demo-5-pre-task-risk-diagnosis.md
├── hooks/
│   └── openclaw/
│       ├── HOOK.md
│       └── handler.ts
└── scripts/
    ├── activator.sh
    ├── bootstrap-workspace.sh
    ├── error-detector.sh
    ├── run-benchmark.py
    └── run-evals.py

⚡ Quick Start

Install the skill into your OpenClaw skills directory.
Bootstrap a persistent .evolution workspace.
Review the learning agenda before difficult tasks.
Let the task loop update memory, diagnosis, training, and evaluation artifacts.
Run the benchmark suite to see how the skill performs in model-in-the-loop conditions.

cp -r self-evolving-agent ~/.openclaw/skills/self-evo-agent
~/.openclaw/skills/self-evo-agent/scripts/bootstrap-workspace.sh ~/.openclaw/workspace/.evolution
python3 ~/.openclaw/skills/self-evo-agent/scripts/run-evals.py ~/.openclaw/skills/self-evo-agent
python3 ~/.openclaw/skills/self-evo-agent/scripts/run-benchmark.py --skill-dir ~/.openclaw/skills/self-evo-agent

More setup details are in install.md.

📦 Installation Options

Option A: Install from ClawHub

Use this when you want the simplest registry-based install into your current OpenClaw workspace.

npm i -g clawhub
# or
pnpm add -g clawhub

clawhub install RangeKing/self-evo-agent

Then start a new OpenClaw session so the skill is loaded from your workspace skills/ folder. The registry slug and local directory are self-evo-agent; the skill and hook name stay self-evolving-agent. If you are migrating from self-improving-agent, import .learnings/ before you disable the old hook.

Option B: Let OpenClaw install it from GitHub

If you prefer to have your agent fetch the GitHub repository directly, you can tell OpenClaw something like:

Install the OpenClaw skill from https://github.com/RangeKing/self-evolving-agent into ~/.openclaw/skills/self-evo-agent, inspect the scripts before enabling hooks, and then bootstrap ~/.openclaw/workspace/.evolution.

This works well when you want the skill installed as a shared managed skill under ~/.openclaw/skills.

Option C: Manual Git clone

git clone https://github.com/RangeKing/self-evolving-agent.git ~/.openclaw/skills/self-evo-agent
~/.openclaw/skills/self-evo-agent/scripts/bootstrap-workspace.sh ~/.openclaw/workspace/.evolution

If you already have ~/.openclaw/workspace/.learnings, use:

~/.openclaw/skills/self-evo-agent/scripts/bootstrap-workspace.sh \
  ~/.openclaw/workspace/.evolution \
  --migrate-from ~/.openclaw/workspace/.learnings

Safety Note

ClawHub is a public registry and skills are effectively trusted local code. Review the repository or installed files before enabling hooks or running benchmark scripts.

🤝 Project Health

Contribution guide: CONTRIBUTING.md
Changelog: CHANGELOG.md
Security policy: SECURITY.md
License: MIT

🧪 Benchmarking

This repository includes two evaluation modes:

scripts/run-evals.py
- Structural compliance checks for files, modules, and benchmark assets
scripts/run-benchmark.py
- Real model-in-the-loop execution using codex exec
- Captures candidate prompt, raw events, final output, judge output, and report

Example smoke run:

python3 scripts/run-benchmark.py \
  --skill-dir . \
  --candidate-model gpt-5.4-mini \
  --judge-model gpt-5.4-mini \
  --max-scenarios 1 \
  --timeout-seconds 90

🧭 Use Cases

Upgrading a self-correcting agent into a self-training agent
Running postmortems that produce training, not just notes
Building skill memory systems that do not confuse logging with mastery
Evaluating whether an agent can transfer strategies across task families
Designing agent curricula for research, coding, verification, or operations workflows

🛣️ Roadmap

Memory, diagnosis, curriculum, evaluator, reflection, promotion modules
Capability bootstrap map and proactive learning agenda
Model-in-the-loop benchmark harness
More benchmark scenarios for coding, research, and long-horizon execution
Optional benchmark trend summaries across repeated runs
Example workspace packs for different agent domains

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

self-evolving-agent

✨ Why It Exists

📊 self-evolving-agent vs self-improving-agent

🚀 What Makes This Different

🧱 Architecture

🔁 Closed Loop

🧩 What It Keeps From self-improving-agent

🔄 Migration From self-improving-agent

🎯 Best Fit

⚖️ Light Loop vs Full Loop

📁 Repository Layout

⚡ Quick Start

📦 Installation Options

Option A: Install from ClawHub

Option B: Let OpenClaw install it from GitHub

Option C: Manual Git clone

Safety Note

🤝 Project Health

🧪 Benchmarking

🧭 Use Cases

🛣️ Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
agents		agents
assets		assets
benchmarks		benchmarks
demos		demos
evals		evals
hooks/openclaw		hooks/openclaw
modules		modules
scripts		scripts
system		system
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
install.md		install.md

Folders and files

Latest commit

History

Repository files navigation

self-evolving-agent

✨ Why It Exists

📊 self-evolving-agent vs self-improving-agent

🚀 What Makes This Different

🧱 Architecture

🔁 Closed Loop

🧩 What It Keeps From self-improving-agent

🔄 Migration From self-improving-agent

🎯 Best Fit

⚖️ Light Loop vs Full Loop

📁 Repository Layout

⚡ Quick Start

📦 Installation Options

Option A: Install from ClawHub

Option B: Let OpenClaw install it from GitHub

Option C: Manual Git clone

Safety Note

🤝 Project Health

🧪 Benchmarking

🧭 Use Cases

🛣️ Roadmap

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages