The objective is to define two relationships:
- Skills: Operational capabilities (what an agent can do). Inspired by coding agents (code gen, debugging, tool use).
- Competencies: Roles and objectives (what an agent should do/be).
- Meta-Level Reasoning: Logic to select the right competency for a context.
Constraint: Minimalistic technological solutions.
- Anthropic's Distinction:
- Tools: Deterministic function calls (the hands/eyes). E.g.,
getWeather,queryDatabase. - Skills: Procedural knowledge (the "how-to"). A "Skill" is a folder of instructions/resources that teaches an agent how to use tools to solve a specific problem class (e.g., "Data Analysis Skill" teaches how to use SQL + Python tools to answer a business question).
- Tools: Deterministic function calls (the hands/eyes). E.g.,
- Emerging Coding Practices:
- Tool Use: The ability to format inputs for and parse outputs from external tools (LSP, grep, liners).
- Context Management: Managing window context, retrieval (RAG).
- Chain-of-Thought: Breaking down complex tasks.
- Interoperability Standards:
- MCP (Model Context Protocol): Standardizes connections to data/tools.
- OASF: Standardized schemas.
- Safety Levels (ASL): Categorizing risk.
- Red Teaming: Adversarial testing.
- Evaluator-Optimizer Loop: Using one model to critique/grade the output of another based on a "Rubric".
- Model-Generated Evaluations: Using strong models to generate test cases.
- Human Competency Frameworks:
- SFIA (Skills Framework for the Information Age): Levels of responsibility (Autonomy, Influence, Complexity, Business Skills).
- AI Fluency: Delegation, Description, Discernment.
- Agent Roles:
- Planner vs. Executor: Common separation (e.g., BabyAGI, AutoGPT).
- Role-Based Access: Defining agents by what they are allowed to do (Guardrails).
- Reflection: "Thinking about thinking".
- Dynamic Selection:
- Router Pattern: Using a classifier to decide which agent/tool to use.
- Supervisor Pattern: A meta-agent managing state and delegating to workers (LangGraph).
- Skills: Abstract "Coding Tools" to "Interface Actions".
- Coding: Write File -> General: Modify Resource.
- Coding: Read Linter -> General: Check Constraint.
- Competencies: Abstract "Developer" to "Creator", "Reviewer" to "Auditor".
- Core Concept: "Teaching" is Context Engineering. We are not training weights; we are structuring the context window to instruct behavior.
- Instructional Components:
- System Prompts: The "Lecture" or "Textbook" (The Procedure).
- Few-Shot Examples: The "Exercises" (Demonstrating correct Interpretation).
- Rubrics: The "Exam" (Self-Correction criteria).
- Interpretation:
- Mapping: Explicitly instructing the agent: "When you see X, it implies Y." (e.g., "Error code 500 means retry, 404 means stop").
- Reference: Linking operational steps to human documentation (e.g., "Follow the style guide at [link]").
- Embodied Reasoning (ER): High-level "brain" model that breaks abstract goals ("sort laundry") into sub-tasks.
- Vision-Language-Action (VLA): Low-level "doer" model.
- Key Insight: Robots "Think before Acting" by generating an internal Chain-of-Thought (CoT) in natural language.
- Application to Agents:
- Skill = High-Level Reasoning Policy.
- Tool = Low-Level Actuation.
- Instruction: Teach the "Reasoning" model how to decompose tasks for the "Actuation" model.
- Structure:
- Metadata in
YAMLFrontmatter ormetadata.json. - Content in
.md(Human readable + Machine parseable via LLM). - Versioning: Git provides history for "Concept Drift" in skills.
- Indexing: Tools like
MarkdownDBcan query this flat structure.
- Metadata in
- Definition: The ability of an agent to "think about its own thinking" to select the right Skill.
- Function:
- Skill Identification: Matching user intent ("Sort this") to Skill capabilities (
classify_object). - Self-Monitoring: "Am I stuck? Do I possess the skill to solve this?"
- Skill Identification: Matching user intent ("Sort this") to Skill capabilities (
- Implementation: A "Router" or "Governor" layer that evaluates available Skills against the current Context.
- Definition: A higher-order role (e.g., "Warehouse Manager") that combines multiple Skills.
- Architecture:
- Hierarchical Task Networks (HTN): Decomposing "Clean Warehouse" ->
navigate+classify+move_object. - Behavior Trees: Logic for switching between Skills (e.g.,
Sequence: [Scan -> Classify -> Act]).
- Hierarchical Task Networks (HTN): Decomposing "Clean Warehouse" ->
- Example:
WarehouseBotcompetency combinesnavigate_grid_2dandclassify_object.
- Structured Output: Never rely on free text for ops. Enforce JSON Schemas (via Zod) for all agent decisions.
- Evaluation-Driven Development (EDD):
- Unit Tests for Prompts: Define "Golden Datasets" (Input -> Expected Output).
- Metric: "Did the agent select the correct tool?" "Did it extract the right SKU?"
- Guardrails:
- Runtime Checks: Validators that run after LLM output but before Tool execution.
- Example: "If
confidence < 0.8, forceRequestHumanReview."
- Standards:
- Schema.org: Use
DefinedTermfor Skills/Competencies (linked toDefinedTermSet). - CaSS/ESCO: Use hierarchical relations (
narrower,broader,requires).
- Schema.org: Use
- GitHub Repository Structure (inspired by CaSS/W3C):
/frameworks/: High-level domains (e.g., "SoftwareEngineering")./competencies/: Groupings of skills (e.g., "Debugging")./skills/: Atomic Instructional Modules./tools/: Atomic Capabilities (e.g., "Camera", "Linter")./concepts/: Shared vocabulary (DefinedTermdefinitions for "LAVA", "CLIFF").
- Data Format: JSON-LD is the canonical format for all entities to enable semantic validation.
- Source of Truth: Markdown files (for human authoring).
- Compiled Reality: JSON-LD files (generated by the Interpreter).
- Role: Converts human-friendly "Cognitive Markdown" into machine-readable JSON-LD.
- Validation Functions:
- Syntactic: "Is this valid Markdown?"
- Semantic: "Does this Action map to a known Tool?" (Ambiguity Check).
- Gaps: "Is there a decision node without a YES/NO branch?"
- Output: A valid RDF Graph representing the skill.
- Discovery Portal (Public):
- Goal: "Schema.org for Agents". Browsing, searching, and learning.
- Access: Read-only. No login required.
- Features: Graph visualization, semantic search.
- Authoring Studio (Private/Auth'd):
- Goal: "IDE for Competencies". Creating and commiting new skills.
- Access: Requires GitHub Login (OAuth).
- Workflow:
- User logs in -> Clones Repo in Browser -> Edits MD -> Interpreter Validates -> User Commits/PRs.
A Skill is a package of Instructions designed to be injected into an agent's context.
- 1. Instructional Context (The "Lesson"):
id: unique_keyname: stringobjective: "You will learn to..."required_context: list[doc_references] (e.g., "Read file X before starting").
- 2. Operational Procedure (The "Textbook"):
- Format:
INSTRUCTIONS.md. A System Prompt formatted as a procedure. - Examples: "Good" vs "Bad" execution traces (Few-Shot).
- Format:
- 3. Cognitive Workflow (The "Mental Model" in Markdown):
- Goal: Define "Thinking Patterns" without using code or graphs.
- Format: Nested Markdown Lists with Keywords.
- Keywords:
> ACTION: Do something.? DECISION: Ask a question.@ CONTEXT: Check environment.! CRITICAL: Safety check.
- Example:
- @ CONTEXT: Check if 'LAVA' is nearby. - ? DECISION: Is distance < 1m? - YES: - ! CRITICAL: Stop immediately. - > ACTION: Plan detour. - NO: - > ACTION: Move forward.
- 4. Interpretation Rules (The "Glossary"):
- Concept: Explicitly mapping Tool Output -> Agent Understanding.
- 4. Interpretation Rules (The "Glossary"):
- Concept: Explicitly mapping Tool Output -> Agent Understanding.
- Format: Markdown table or YAML dictionary.
- Structure:
Pattern: Regex or value match (e.g.,status: "OK").Meaning: Semantic concept (e.g., "Operation Successful").NextAction: Recommended edge traversal (e.g., "Proceed to Analysis").
- 5. Quality Assurance (The "Exam"):
- Golden Dataset: List of
{input, expected_action}pairs. - Guardrails: Runtime rules (e.g., "Never delete > 5 files").
- Semantic Validation (RDF/OWL):
- Syntactic: Use SHACL (Shapes Constraint Language) to validate the JSON-LD structure against the ontology.
- Semantic: Use an OWL Reasoner (e.g., HermiT, Pellet) to check for logical inconsistencies (e.g., a "Skill" cannot be a subclass of "Tool").
- Golden Dataset: List of
- 6. Ontology Alignment (The "Label"):
alignment: List of external IDs (e.g., ESCO URI, WikiData ID).type: Schema.orgDefinedTerm.
- 7. Tools (The "Hands"):
- Definition: Atomic, deterministic capabilities (Functions/APIs).
- Schema:
name: Function name.description: What it does.parameters: JSON Schema of inputs.source: Path to implementation.
id: navigate_grid_2d
name: "2D Grid Navigation"
objective: "Learn to interpret 2D coordinate objects and plan a safe path avoiding obstacles."
context_docs:
- "docs/coordinates_system_v1.md"
tools:
- "tools/scan_surroundings"
- "tools/move_forward"
- "tools/get_position"{
"type": "DefinedTerm",
"termCode": "S1234",
"inDefinedTermSet": "https://esco.ec.europa.eu/en/classification/skills",
"name": "Navigate Unstructured Environments",
"url": "https://schema.org/Skill"
}# Precision Navigation Protocol
1. **Scan**: Always check your current position and immediate surroundings using `scan_surroundings()`.
2. **Plan**: Before moving, calculate the entire path to the target.
- PREFER paths with high "safety_score".
- AVOID paths that pass within 1 unit of "LAVA".
3. **Move**: Execute movement one step at a time.
4. **Verify**: After moving, check `get_position()` to confirm you are where you think you are.# Mental Model: Safe Navigation
- @ CONTEXT: Scan surroundings using `scan_surroundings()`.
- ? DECISION: Is an obstacle detected?
- YES:
- > ACTION: Calculate alternative path.
- > ACTION: Log "Obstacle Avoidance" event.
- NO:
- > ACTION: Move forward using `move_forward()`.
- ! CRITICAL: Verify position with `get_position()`.
- ? DECISION: Are we at the goal?
- YES:
- > ACTION: Signal "Success".
- NO:
- ^ LOOP: Restart from "Scan".rules:
- tool: scan_surroundings
output_pattern: "type: 'void'"
meaning: "Cliff Edge - Fatal Hazard"
instruction: "Mark coordinate as INVALID_MOVE."
- tool: get_position
output_pattern: "slippage: true"
meaning: "Movement Error"
instruction: "Re-calibrate position before next move."golden_dataset:
- input: "Goal is at (5,5). LAVA is at (4,5)."
expected_workflow: ["Scan", "Plan Detour", "Move"]
assert_not_touched: ["(4,5)"]
guardrails:
- rule: "max_consecutive_moves"
limit: 5
action: "Force Re-Scan"
#### 7. JSON-LD Compilation (Generated by Interpreter)
```json
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "2D Grid Navigation",
"step": [
{
"@type": "HowToStep",
"name": "Scan",
"itemListElement": {
"@type": "HowToDirection",
"text": "Scan surroundings using scan_surroundings()",
"instrument": { "@id": "http://library.io/tools/scan_surroundings" }
}
}
]
}
### B. Skill: `classify_object` (Visual Analysis)
#### 1. Instructional Context
* **Objective**: "Identify object type and fragility from visual data."
* **Tools**: `camera.capture`, `camera.flash_on`.
#### 2. Ontology Alignment
* **Schema.org**: `DefinedTerm: Visual Inspection`.
#### 3. Operational Procedure
"Always ensure lighting is > 500 lumens. If dark, use flash. Analyze shape first, then texture."
#### 4. Cognitive Workflow
```markdown
# Mental Model: Object Classification
- > ACTION: Capture high-res image `camera.capture()`.
- ? DECISION: Is lighting sufficient?
- NO:
- > ACTION: Enable flash `camera.flash_on()`.
- ^ LOOP: Retry Capture.
- YES:
- > ACTION: Analyze features (Shape, Color, Texture).
- ? DECISION: Match confidence > 90%?
- YES:
- > ACTION: Return Label.
- NO:
- > ACTION: Request human verification.
confidence < 0.5: "Ambiguous Object" -> Request Zoom.color: "red" AND shape: "octagon": "Stop Sign".
- Golden Dataset: Image of "Cracked Vase" -> Output: "Fragile".
- Guardrails: "Never output PII labels (names/faces)."
- Role: "Sort implementation items in a grid warehouse."
- Skills:
navigate_grid_2d,classify_object. - Behavior Tree (Orchestration):
- Search: Use
navigate_grid_2din "Explore Mode". - Found Item: Trigger
classify_object. - Act:
- If
Fragile->HandleWithCare. - If
Heavy->RequestForklift.
- If
- Repeat.
- Search: Use
- App: Vite+React App that treats GitHub as a Headless CMS (using Octokit).
- Validation Pipeline:
- Zod: Runtime TypeScript validation.
- SHACL: RDF Structure validation.
- Evaluations: LLM-based logical checks.