Merged
Conversation
Deploying everyskill with
|
| Latest commit: |
b800d4e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://cc5a0e90.everyskill.pages.dev |
| Branch Preview URL: | https://skill-skill-generalizer-1774-yost.everyskill.pages.dev |
Frontier Model Security Review
Final Verdict: WARN Claude Opus 4.6 Analysis{"verdict": "PASS", "issues": [], "reasoning": "This skill is a meta-skill that instructs an LLM how to transform team-specific skills into generalized, onboardable ones. The review found no security issues: (1) No prompt injection — there are no hidden instructions, 'ignore previous' patterns, or encoded commands. The skill's instructions are straightforward and transparent about their purpose. (2) No dangerous tool usage — the skill does not invoke Bash, execute shell commands, or access sensitive system files. It only reads and writes SKILL.md files and a memory/.onboarded-skills.md state file, which are within the expected skill workspace. (3) No data exfiltration — there are no network calls to external URLs, no reading of credentials or secrets, and no instructions to send data anywhere outside the system. (4) No social engineering — the skill does not attempt to manipulate users or other skills into bypassing security controls. It explicitly instructs not to ask users for technical details like API keys or JSON schemas during onboarding. (5) No overly permissive tool usage — no tools are declared or invoked. The skill operates purely as LLM instructions for content transformation. The file write operations (updating SKILL.md and memory/.onboarded-skills.md) are standard for the Plus One skill framework and scoped to the skill's own directory. The onboarding behavior rules about not showing internal dialogue are good security hygiene, not obfuscation."}GPT-5.2 Analysis{"verdict":"WARN","issues":[{"category":"Prompt injection","severity":"medium","evidence":"Skill contains strong meta-instructions that can override typical assistant safety/UX constraints (e.g., \"Your job is...\", \"Work through these steps\", \"Never show internal dialogue\"). While not classic 'ignore previous' injection, it is a coercive instruction set that could conflict with platform policies or auditing expectations."},{"category":"Dangerous tool usage / state mutation","severity":"medium","evidence":"Explicitly instructs writing to and creating files: \"Add `{skill-name}` to `memory/.onboarded-skills.md` (create the file if it doesn't exist)\" and \"Fill in the user's answers directly in the relevant sections below\" and \"update this SKILL.md file directly\". This is persistent state modification and could be abused if the agent has filesystem write tools."},{"category":"Overly permissive operational behavior","severity":"medium","evidence":"Instructs to \"Create the cron job\" and \"update or recreate the cron\" based on user input without specifying validation, least privilege, sandboxing, or constraints. Scheduling plus file writes can be leveraged for persistence/abuse if paired with code execution tools."},{"category":"Data exfiltration risk (indirect)","severity":"low","evidence":"The onboarding/test steps encourage fetching content and running skills; depending on the generalized skill being processed, this could lead to network calls. This skill itself does not specify external endpoints, but it encourages generating skills that may fetch from arbitrary user-provided sources (e.g., RSS URLs), which can become SSRF-like if not constrained at the platform level."}],"reasoning":"No explicit malicious payloads, encoded commands, credential theft, or direct instructions to access sensitive files are present. However, the skill promotes persistent file edits, onboarding state tracking via local memory files, and cron creation/update without guardrails. These capabilities are not inherently malicious but increase attack surface and could be abused if the agent has write/scheduler tools. Recommend adding constraints/validation (allowed paths, denylist sensitive files, no arbitrary command execution, cron schedule validation, network egress restrictions/SSRF protections) and clarifying that any filesystem/scheduler changes must be performed through controlled, audited platform APIs rather than raw shell/file tools."}Frontier model review complete. Human approval still required. |
32deba9 to
b800d4e
Compare
Frontier Model Security Review
Final Verdict: WARN Claude Opus 4.6 Analysis{"verdict": "PASS", "issues": [], "reasoning": "This skill is a meta-skill that instructs an LLM how to transform team-specific skills into generalized, onboardable ones. The review finds no security issues: (1) No prompt injection - there are no hidden instructions, 'ignore previous' directives, or encoded commands. The skill straightforwardly describes a generalization process. (2) No dangerous tool usage - the skill does not invoke Bash, execute shell commands, or access sensitive system files. It only reads/writes to its own skill markdown files and a memory state file (memory/.onboarded-skills.md), which are within its expected operational scope. (3) No data exfiltration - there are no network calls to external URLs, no reading of credentials, API keys, or tokens. The skill explicitly instructs to REMOVE hardcoded API keys and credentials from source skills during generalization. (4) No social engineering - the skill is transparent about what it does and does not attempt to manipulate users or other systems. (5) No overly permissive tools - the skill does not declare or request any tool permissions. It operates purely as prompt/instruction content for an LLM. The skill's purpose is benign: it's a template/process guide for converting specific skills into general-purpose ones, with sensible onboarding flows. File writes are limited to the skill's own SKILL.md and a state tracking file in the memory directory, which is standard for the Plus One bot framework."}GPT-5.2 Analysis{"verdict":"WARN","issues":[{"category":"Prompt injection","severity":"low","evidence":"The skill contains imperative meta-instructions about how the assistant should behave and what to output (e.g., enforcing onboarding behavior, file-writing steps). Not covert, but could conflict with platform policies if interpreted as higher-priority than system/developer instructions."},{"category":"Dangerous tool usage / persistence","severity":"medium","evidence":"Instructs the agent to 'update this SKILL.md file directly', 'fill in the user's answers directly', 'add {skill-name} to memory/.onboarded-skills.md (create the file if it doesn't exist)', and 'Create the cron job' / 'update or recreate the cron'. These are operational changes and persistence mechanisms that could be risky if tools are available without safeguards."},{"category":"Sensitive file access guidance","severity":"low","evidence":"References internal state file paths (memory/.onboarded-skills.md, memory/*.json). While it also says not to show these to the user, it still directs reading/writing internal files, which should be tightly permissioned."},{"category":"Overly permissive allowed-tools (implicit)","severity":"medium","evidence":"The spec assumes capabilities like filesystem writes and cron management but does not constrain them (no mention of sandboxing, path allowlists, cron restrictions, or validation). If paired with broad shell/tool access, this becomes a meaningful risk surface."}],"reasoning":"No clear hidden prompt-injection payloads (no encoded/obfuscated commands, no 'ignore previous' style attacks), no explicit exfiltration instructions, and no direct harmful shell commands. However, the skill explicitly directs persistent modifications (writing files, creating/updating cron jobs) and assumes privileged operational tooling without guardrails. This is not inherently malicious, but it increases risk if executed in an environment with powerful tools and insufficient policy enforcement. Recommended mitigations: restrict write locations to the skill’s own directory and a dedicated state store, require user confirmation before persistence/cron changes, enforce cron allowlists/rate limits, and ensure tool permissions prevent reading secrets or arbitrary files."}Frontier model review complete. Human approval still required. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New Skill Submission
Skill: skill-generalizer
Submitted by: Brandon Gell
Reason: Transforms team-built, use-case-specific skills into generalized, onboardable skills that any Plus One bot can adopt. Updated to use separated onboarding state (memory/.onboarded-skills.md) and preserve message formatting during generalization.
This PR was auto-generated from skills.every.to (agent-api).
AI security review will run automatically.