Skip to content

feat(autoskills): markdown scanner + LLM-driven mode (--from-spec, --scan-docs, subcommands)#104

Open
krlosflipdev wants to merge 18 commits into
midudev:mainfrom
krlosflipdev:feat/markdown-scanner-llm-mode
Open

feat(autoskills): markdown scanner + LLM-driven mode (--from-spec, --scan-docs, subcommands)#104
krlosflipdev wants to merge 18 commits into
midudev:mainfrom
krlosflipdev:feat/markdown-scanner-llm-mode

Conversation

@krlosflipdev
Copy link
Copy Markdown

What Changed

Adds an opt-in markdown scanner and an LLM-driven workflow to autoskills, alongside machine-readable JSON output and new subcommands for agent integration.

New CLI flags

  • --from-spec <path>: parse a spec file (.md/.mdx/.markdown/.txt) and install detected skills.
  • --scan-docs: auto-scan CLAUDE.md / AGENTS.md / README.md at project root.
  • --json: structured output for every mode (list / dry-run / install) with typed error envelopes.
  • --show-specgen-prompt / --copy-specgen-prompt: print / copy the spec-generator prompt to stdin/clipboard for LLM use.

New subcommands

  • autoskills list [id] [--filter <alias>] [--json]: list catalog or inspect one skill (alias-aware).
  • autoskills install --only <ids> [-y] [--json]: install comma-separated ids with fuzzy suggestion (Levenshtein ≤ 2) and parallel installSkill execution.

Markdown scanner (markdown-scanner.ts, 332 lines)

  • Fenced code blocks: json (deps/devDeps), bash/sh/shell/zsh (npm/pnpm/yarn/bun add/install), yaml/toml (configFileContent patterns), ruby/gemfile (gem '<name>').
  • Stack headings (h1–h3): bullets, numbered lists (1. / 1)), comma-separated inline, GFM tables with heuristic tech-column picker (tech/framework/library/package/dependency/name/stack → fallback col 1).
  • Heading normalization: strips numbering, emoji, bold/italic, brackets, colon, trailing paren; matches keyword list case-insensitively.
  • Table cell normalization: links, images, bold/italic, backticks, emoji; multi-tech split on comma.
  • Alias matching via Technology.aliases?: string[]; seeded on nextjs, vue, svelte, tailwind, typescript, node.

Clipboard (clipboard.ts, 84 lines)

  • Zero-dep cross-platform copy via spawn: pbcopy (macOS), wl-copyxclip (Linux), clip.exe (Windows/WSL).
  • Distinguishes ENOENT / exit-code / error states.

Spec-generator prompt

  • prompts/spec-generator-prompt.md: LLM generates docs/specs-initial.md from user requirement; instructs user to run --from-spec themselves (no install).

JSON envelopes (cli-json.ts, 104 lines)

  • serializeList / serializeDryRun / serializeInstall / serializeError.
  • Typed error codes: cli-arg-invalid, spec-file-missing, install-missing-only, install-empty-only, install-unknown-id, prompt-file-missing, unknown-subcommand, internal-error.

Breaking change

  • autoskills prompt subcommand and --copy-prompt flag removed. Use --show-specgen-prompt / --copy-specgen-prompt instead.

Why This Change

Existing autoskills only detected tech via package.json / lockfiles / framework config files. This misses projects where the stack is declared in docs (CLAUDE.md, AGENTS.md, README.md, spec files) before code exists — the common greenfield / LLM-assisted workflow.

Goals:

  1. Detect skills from prose and code fences in markdown, not just dependency files — covers pre-code planning phase.
  2. Give LLM agents machine-readable output (--json) + a structured prompt (--copy-specgen-prompt) so Claude/Codex/other agents can drive skill selection end-to-end.
  3. Keep core behavior unchanged: scanner is opt-in (--from-spec / --scan-docs). Default autoskills still ignores markdown.

Testing Done

  • Manual testing completed
  • Automated tests pass locally (495/495)
  • Edge cases considered and tested

Test additions:

  • markdown-scanner.test.ts (341 lines): JSON / shell / yaml / toml / ruby fences, stack headings (bullets / numbered / inline / tables), heading decorations, alias matching, unicode-safe truncation, edge cases (empty, unterminated fence, 100 KB, cross-fence dedup).
  • cli-from-spec.test.ts, cli-json.test.ts, dry-run-json.test.ts, errors.test.ts, install-only.test.ts, load-md-sources.test.ts, merge-md.test.ts, subcommands.test.ts, subcommands-list-prompt.test.ts, clipboard.test.ts, copy-prompt.test.ts, helpers.test.ts, logging.test.ts.
  • Fixtures: tests/fixtures/specs/ (well-formed, only-fences, only-heading, empty, malformed) + tests/fixtures/projects/ (with-claude-md, with-agents-md, with-both).

Verification commands:

cd packages/autoskills
pnpm build    # tsc clean
pnpm test     # 495/495 pass

Backwards compat: core test suite unchanged (332 pre-existing tests still pass); REGRESSION guard ensures default run ignores CLAUDE.md.

Type of Change

  • fix: Bug fix
  • feat: New feature
  • refactor: Code refactoring
  • docs: Documentation
  • test: Tests
  • chore: Maintenance/tooling

Contains one BREAKING commit (7452cc7): --copy-prompt / prompt subcommand renamed.

Security & Quality Checklist

  • No secrets or API keys committed
  • Follows the project's coding standards
  • No sensitive data exposed in logs or output

Notes:

  • clipboard.ts uses spawn with argv arrays (no shell string interpolation) — no command-injection surface.
  • Scanner reads user-supplied markdown as data only; no eval / dynamic import / shell exec of parsed content.
  • --from-spec throws spec-file-missing on non-existent paths; no silent read from unexpected locations.

Documentation

  • Updated relevant documentation
  • Added comments for complex logic (minimal — code is self-documenting per project convention)
  • README updated

Doc changes:

  • README.md (root): Options block extended with new flags; added LLM-driven mode section linking package README.
  • packages/autoskills/README.md: full Options table; new sections Markdown scanner (opt-in) + Subcommands (for LLM integration); supported heading formats (4 content formats: dash/numbered bullets, GFM tables, comma-inline); normalization rules; --from-spec extension flexibility; complete error codes list.
  • AGENTS.md: fixed stale .mjs references to .ts; expanded shared helpers list.

Commits (16 feature + 1 merge)

b1d4261 feat(autoskills): foundation for opt-in markdown scanner + LLM mode
146948c feat(scanner): add markdown-scanner with JSON fence support + test helpers
ab1a711 feat(scanner): add shell/yaml/toml/ruby fences + stack headings with alias matching
b0d0fa0 feat(lib): add markdown source loader + merger, scanner edge-case tests
8d0c234 feat(cli): wire --from-spec/--scan-docs + add cli-json serializers
1276881 feat(cli): add list+prompt subcommands, ship skill-selection.md
cff1b15 feat(subcommands): add install --only with fuzzy suggest + DI installer
afb8faa feat(cli): subcommand dispatch + structured JSON error envelopes
03f9e76 docs: document markdown scanner + LLM subcommands + refresh helpers/refs
e9be97e feat: Updated format examples for doc parsing
56994c9 feat(scanner): flexible headings + numbered/comma/tables under stack
6b6c34a feat(scan-docs): include README.md in auto-scan whitelist
375460a refactor(scanner): drop Tecnologías keyword + document flexible formats
986fd99 feat(cli): add --copy-prompt + rewrite prompt for spec-doc workflow
7452cc7 feat(cli)!: rename copy/show prompt flags + drop `prompt` subcommand
8731171 refactor: Updated commands for LLM in package README.md
4ba0fa5 Merge branch 'main' into feat/markdown-scanner-llm-mode

Diff summary

40 files changed, 3106 insertions(+), 38 deletions(-)

New files: markdown-scanner.ts, cli-json.ts, clipboard.ts, subcommands.ts, prompts/spec-generator-prompt.md, 13 new test files, 10 fixture files.

  - widen Technology interface with optional aliases?: string[] and description?: string
  - add tests/fixtures/specs/: well-formed, only-fences, only-heading, empty, malformed, non-english
  - add tests/fixtures/projects/: with-claude-md, with-agents-md, with-both (package.json + CLAUDE.md/AGENTS.md)
  - no existing SKILLS_MAP entries changed; 332 existing tests still pass; tsc clean
…lpers

  - add scanMarkdown(content, skillsMap) -> MarkdownMatch[] with JSON fence branch (deps + devDeps)
  - extract fences, skip malformed JSON, skip no-language fences, dedupe by techId
  - add test helpers: writeMarkdown, readFixtureSpec, parseJsonOutput, buildMarkdownFromParts, mockInstaller
  - parseJsonOutput uses backward line-walk for nested JSON and top-level arrays
  - unicode-safe evidence truncation in scanner
  - 344 tests pass, tsc clean
…alias matching

  - shell fences (bash/sh/shell/zsh): extract from npm/pnpm/yarn/bun add/install with \b anchor + shell-operator chain stop
  - yaml/toml fences: match against detect.configFileContent patterns
  - ruby/gemfile fences: match `gem '<name>'` against detect.gems
  - stack headings (h1-h3, EN + Tecnologias): parse bullets, normalize, match via name + aliases
  - seed aliases on nextjs, vue, svelte, tailwind, typescript, node
  - 363 tests pass, tsc clean
  - add loadMarkdownSources({fromSpec,scanDocs,projectDir}) to lib.ts: reads --from-spec, auto-discovers CLAUDE.md/AGENTS.md, dedupes by absolute path, throws "spec file not found" on missing fromSpec
  - add mergeMarkdownDetections(coreIds,matches): union preserving core order then scanner order
  - scanner: 5 edge-case tests (precedence, empty, 100KB, unterminated fence, cross-fence dedup); no scanner logic changes
  - 382 tests pass, tsc clean
  - main.ts: parse --from-spec <path> and --scan-docs; gate loadMarkdownSources+scanMarkdown+mergeMarkdownDetections behind opt-in; shadow detected/combos/isFrontend defaulting to core.*; parseArgs validates --from-spec arg
  - add cli-json.ts: serializeList/DryRun/Install/Error; ListJson hides detect; aliases defensively copied
  - REGRESSION guard: default autoskills ignores CLAUDE.md
  - 400 tests pass, tsc clean
  - subcommands.ts: runList (json + human + alias filter) and runPrompt (reads shipped prompts/skill-selection.md)
  - resolvePromptPath tries dev + dist layouts (source: pkg/prompts, built: pkg/dist/../prompts)
  - promptPath DI for testability + ENOENT try/catch -> prompt-file-missing JSON envelope
  - ship prompts/skill-selection.md (LLM selection guide, v1)
  - package.json files: add "prompts"; version unchanged (owner releases)
  - 409 tests pass, tsc clean
  - runInstall: parse comma-separated ids, dedupe, validate against SKILLS_MAP
  - Levenshtein <=2 fuzzy suggestions (catalog-order tiebreak) for unknown ids
  - install-missing-only / install-empty-only / install-unknown-id JSON envelopes
  - parallel installSkill via Promise.all with mockInstaller DI hook for tests
  - human + json output modes; exit 1 on any failure or validation error
  - autoYes reserved for T17 interactive prompt
  - 420 tests pass, tsc clean
…low; --dry-run --json emits serializeDryRun;

  banner/printDetected gated on !args.json
  - parseArgs: positional subcommand + --only/--filter/--json/--path; consumed-index tracking; implicit --filter for `list <id>`
  - ArgError class -> structured cli-arg-invalid / internal-error envelopes on --json
  - --json without subcommand or --dry-run rejected with envelope
  - unknown-subcommand envelope
  - installer: AUTOSKILLS_MOCK_INSTALL @internal test hook
  - 448 tests pass, tsc clean, REGRESSION intact
  - packages/autoskills/README.md: Options table (+--from-spec, --scan-docs, --json), new sections Markdown scanner (opt-in) + Subcommands (for LLM integration); fix -a syntax example; complete error codes list
  - root README.md: Options block +new flags, new LLM-driven mode section linking package README
  - AGENTS.md: fix stale .mjs refs -> .ts; expand shared helpers list with writeMarkdown/readFixtureSpec/parseJsonOutput/buildMarkdownFromParts/mockInstaller
  - normalizeHeadingTitle: strip numbering, emoji, bold/italic, brackets, colon, trailing paren
  - extractStackBlocks: isStackHeading(line) via normalize-then-compare
  - accept numbered bullets (1./1)), comma-separated inline, GFM tables
  - table cell normalization: links, images, bold/italic, backticks, emoji; multi-tech split on comma
  - pickTechColumn heuristic (tech/framework/library/package/dependency/name/stack) + fallback col 1
  - MarkdownMatch API unchanged: all new detections reuse source: "stack-heading"
  - 32 new tests (14 helper + 7 heading decoration + 2 numbered + 3 inline + 6 tables); 480/480 pass
  - loadMarkdownSources: add README.md alongside CLAUDE.md / AGENTS.md (order preserved)
  - warning text updated: "no CLAUDE.md, AGENTS.md, or README.md found"
  - tests: new README.md-only and three-file cases in load-md-sources
  - cli-from-spec: update warning assertion to match new string
  - STACK_KEYWORDS: remove "tecnologías" / "tecnologias" — keep keyword set language-agnostic; accepted headings self-evident from listed English keywords
  - tests: update heading-keywords test title + array; delete orphan fixture tests/fixtures/specs/non-english.md (no test imported it)
  - packages/autoskills/README.md: rewrite "Bullet format" → "Supported formats under stack headings" with heading-shape examples, 4 content formats (dash/numbered bullets, GFM tables with header heuristic, comma-inline), normalization rules
  - packages/autoskills/README.md: document --from-spec extension flexibility (.md/.mdx/.markdown/.txt); note "Also:" prefixes not stripped; heuristic keyword list includes plurals; stack-headings bullet points to new subsection
  - root README.md: sync Options + expand blockquote (fences, keywords, content formats, decorated headings, tables-outside-heading exclusion)
  - clipboard.ts: cross-platform copy via spawn (pbcopy/wl-copy→xclip/clip.exe), zero deps; ENOENT/exit/error all distinguished
  - subcommands.ts: runCopyPrompt reuses resolvePromptPath; success → ✓ msg + Cmd/Ctrl+V hint; failure → prompt to stdout + warning to stderr (exit 0)
  - main.ts: --copy-prompt early-exit before subcommand dispatch (mirrors --help)
  - prompts/skill-selection.md → spec-generator-prompt.md: LLM now generates docs/specs-initial.md from user requirement (heading + dash bullets), instructs user to run --from-spec themselves (no install)
  - 13 new tests (8 clipboard + 5 copy-prompt); 497/497 pass
  BREAKING: `autoskills prompt` and `--copy-prompt` removed.
  Use `--show-specgen-prompt` or `--copy-specgen-prompt` instead.

  - new clipboard.ts: cross-platform spawn copy (pbcopy/wl-copy→xclip/clip.exe), zero deps; ENOENT/exit/error distinguished, fallback prints content + warning
  - new --show-specgen-prompt + --copy-specgen-prompt: top-level flags, early-exit before dispatch (mirrors --help); show wins if both passed
  - drop prompt subcommand + --path flag; KNOWN now {list, install}
  - READMEs: document spec-doc flow with two chat variants (bash-tool first, paste second); root example reframed as problem statement
  - tests: +13, -3; 495/495 pas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant