📝 docs: Consolidate specs directory, validate ACPX spec, and fix Gateway Async Init#2
Merged
📝 docs: Consolidate specs directory, validate ACPX spec, and fix Gateway Async Init#2
Conversation
…ts, Admin health
## EventStore Core (EVT-001~006)
- Add events table schema (id/session_id/seq/event_type/payload_json)
- MessageStore interface + SQLiteMessageStore implementation
- Async batch writer (channel 1024, 100ms/50-event flush)
- Gateway integration via Bridge.Append on done events
- HOTPLEX_EVENT_STORE=disabled toggle for optional persistence
- owner_id migration on sessions table
## Metrics Inc/Set Wiring (OBS-004~005)
- Wire all 13 metric vectors across hub/conn/manager/pool
- SessionsActive/Total/Terminated/Deleted
- WorkersRunning/StartsTotal/ExecDuration
- GatewayConnectionsOpen/MessagesTotal/DeltasDropped/ErrorsTotal
- PoolAcquireTotal/Utilization
## Test Infrastructure (TEST-001~003,005~007)
- GitHub Actions CI (go vet + test -race + coverage)
- security/session/gateway table-driven tests with testify/require
- mockStore with testify/mock
- WebSocket mock server (detached goroutine handler)
- All 5 test packages passing
## Worker Process Limits (WK-009, RES-005)
- bufio.Scanner 64KB init / 1MB cap per line
- ReadLine() with panic-recover for ErrTooLong
- RLIMIT_AS 512MB via syscall.Setrlimit
- WorkerHealth struct + Health() interface
## Admin Health Endpoints (ADMIN-006~007)
- /admin/health: unauthenticated (moved before admin mux)
- /admin/health/ready: new readiness probe
- WorkerHealthStatuses() real probing, 503 when unhealthy
## Bug Fixes
- Fix fork bomb regex pattern (:\(\)\s*\{\s*:\|)
- Fix ValidateInit typed nil (use err==nil direct check)
- Fix newTestWSServer synchronous handler deadlock (detach goroutine)
- Fix TestSafePathJoin non-existent paths (create files first)
- Fix TestExpandEnv HOME env var (use TEST_MY_HOME)
## OTel Tracing (OBS-006) - Add internal/tracing/tracing.go: Init/Shutdown/Attr utilities - Graceful degradation: no-op tracer if OTEL_SDK_DISABLED=true or no endpoint - OTEL_EXPORTER_OTLP_ENDPOINT env var for exporter configuration - Spans: hub.broadcast, conn.recv, conn.init - Span attributes: session_id, event_type, seq, priority ## Config Hot Reload (CONFIG-006~008) - Add internal/config/watcher.go: fsnotify file watcher - 500ms debounce to prevent rapid-fire reloads - HotReloadableFields: gateway.addr/pool.max_size/gc_scan_interval etc. - StaticFields: security.api_keys/db.path require restart - ConfigChange audit log with timestamp/field/old/new/hot - Hot reload callback wired in main.go run() ## Dependencies - Add go.opentelemetry.io/otel/* (OTLP SDK + stdout exporter)
## Security Package (SEC-001~045) - internal/security/env.go: BaseEnvWhitelist, ProtectedEnvVars, Sensitive detection - internal/security/env_builder.go: BuildEnv, AddWorkerType, AddHotPlexVar - internal/security/jwt.go: ES256 JWT validation, JTI blacklist, claims - internal/security/limits.go: MaxEnvelopeBytes/MaxSessionBytes/MaxLineBytes - internal/security/model.go: AllowedModels whitelist - internal/security/path.go: SafePathJoin, BaseDir validation - internal/security/ssrf.go: URL validation, blocked CIDRs, DNS rebinding protection - internal/security/tool.go: AllowedTools, BuildAllowedToolsArgs ## SPEC Documentation - docs/SPECS/Acceptance-Criteria.md: Full AC spec (20 categories, 157 items) - docs/SPECS/AC-Tracking-Matrix.csv: CSV tracking format - docs/SPECS/AC-Tracking-Matrix.md: Detailed tracking matrix - docs/SPECS/README.md: SPEC directory overview - .github/workflows/pr-checks.yml: PR checks workflow
…EVENT_STORE env var)
Upgrade CI workflow with current best practices: - actions/checkout@v6, actions/setup-go@v6, go-version '1.26' - golangci-lint-action@v9 with latest version - Path filter (dorny/paths-filter) to skip CI on doc-only changes - concurrency group with cancel-in-progress - Minimal permissions (contents: read, pull-requests: write) - setup-go cache for faster dependency resolution - codecov-action@v5 upload with explicit slug - Add PR checks workflow (branch naming + issue link validation) - Add standard PR template BREAKING CHANGE: CI now requires CODECOV_TOKEN secret for coverage upload
…odes Gateway init protocol improvements: - Add InitAuth struct to carry Bearer token from client - Add InitConfig parsing (model, system_prompt, allowed_tools, disallowed_tools, max_turns, work_dir) - Add InitAuth to InitData and wire through ValidateInit - Add ERR_CODE_VERSION_MISMATCH for clearer version errors (replaces ad-hoc PROTOCOL_VIOLATION on version mismatch) - Add ServerCaps.MaxTurns and ServerCaps.Modalities fields - Add OwnerID field to Envelope for authenticated user tracking - Add 12 new error codes: WORKER_OOM, SESSION_EXPIRED/TERMINATED/ INVALIDATED, AUTH_REQUIRED, VERSION_MISMATCH, CONFIG_INVALID, GATEWAY_OVERLOAD, EXECUTION_TIMEOUT, RECONNECT_REQUIRED, WORKER_OUTPUT_LIMIT - Fix golangci-lint errors: remove unused initDataFromMap helper and fix unchecked pool.Acquire error returns in tests
Restructure agent rules from 3 monolithic files into 6 focused modules: - go125.md + golang-style.md → golang.md (merged, Go 1.26 aligned) - go126.md → removed (superseded by golang.md) - New aep.md: AEP v1 protocol spec (envelope/codec/routing/backpressure) - New security.md: JWT/SSRF/Env isolation/command whitelist/AllowedTools - New session.md: 5-state machine/TransitionWithInput/SESSION_BUSY/ GC strategy/mutex spec/PoolManager/SQLite WAL - New metrics.md: Prometheus naming/OTel Span/SLO definition - worker-proc.md: update paths filter from pool/ to session/
These reports documented implementation gaps from a previous review cycle that have since been addressed or superseded by the current SPECS. Keeping them risks spreading outdated information.
Add prominent header noting that EventStore/MessageStore/AuditLog are NOT implemented in v1.0 — rationale: Worker itself handles persistence (Claude Code ~/.claude/projects/, OpenCode server-side state); Gateway scope is control-plane only. Roadmap table updated to mark all items as ❌ not implemented with note that v1.0 defers to Worker-layer persistence.
Update references to point to new modular rule files: - golang.md, aep.md, security.md, session.md, metrics.md, worker-proc.md, testing.md Remove obsolete go125.md/go126.md/golang-style.md references Add cross-references to session.md for detailed state machine docs
…ion tests Config system: - Watcher.NewWatcher: add SecretsProvider param to support loading sensitive values from external secret stores - Config.Load: add SecretsProvider field and pass to Watcher - cmd/gateway/main.go: wire JWT secret from cfg.Security.JWTSecret (loaded via config secrets provider) instead of os.Getenv - Add codecov.yml with standard configuration Test coverage: - Add dbginline_test.go for inline debug config validation - Add directcheck_test.go for AEP init envelope direct validation - Add validatecheck_test.go for InitData.ValidateInit unit tests - Add validatecheck2_test.go for InitData edge case coverage
Acquire previously returned *PoolError, causing typed-nil issues with require.NoError in tests. Changed to return error interface; callers that need PoolError fields use type assertion. - pool.go: Acquire returns error instead of *PoolError - manager.go: type-assert to *PoolError for Kind field access - pool_test.go: type-assert in GlobalLimit and UserQuotaLimit tests
Replace unsafe blank-identifier type assertion with errors.As, preventing nil-pointer panic if Acquire ever returns a non-PoolError error type. Also remove unused cfg variables from pool tests.
- Add .agent/settings.local.json to .gitignore - Restructure codecov coverage targets to align with project module organization (security/protocol/session/worker as separate targets, each with appropriate thresholds)
Verified against live codebase: - CONFIG-006~008: 🟢 PASS (fsnotify watcher + debounce + audit log implemented) - CONFIG-010: 🟢 PASS (Viper merge + LoadOptions chain) - EVT-001: 🟢 PASS (MessageStore interface + SQLiteMessageStore wired) - TEST-001~002,004,006~007: 🟢 TODO → 🟡 IN_PROGRESS (10 test files, testify/require, WS mock server, codecov.yml all present; GitHub Actions and E2E/Playwright still missing) - Summary: 130/157 PASS (83%), P1 86%, recalculated milestones
Phase 1 — EventStore:
- EVT-002~006 already verified PASS
Phase 2 — Worker robustness:
- AEP-020: worker crash mapped to synthetic failure done event via Wait()
- WK-009: Bridge sends synthetic failure done on worker non-zero exit
- WK-010: anti-pollution turn counting with ErrMaxTurnsReached + auto-kill
- WK-011: LastIO() added to Worker interface for GC zombie detection
- SEC-045: AllowedTools wired into managedSession (impl pending real adapters)
Phase 3 — Admin API:
- ADMIN-008: Hub.LogHandler callback + ring buffer for event capture
- ADMIN-009: POST /api/v1/config/validate with JSON body parsing
- ADMIN-010: GET /api/v1/debug/sessions/{id} exposes mutex/mu/turn_count/worker_health
- GW-006: Hub.Shutdown drains broadcast queue before closing connections
Also:
- managedSession fields exported (Worker, Mu, TurnCount) for debug access
- fmt import added to hub.go
- Worker interface updated: LastIO() time.Time method added
- NoopWorker implements LastIO()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ess PR - hub.go: add SeqGen.Peek() for read-only seq access; add Hub.NextSeqPeek() - session/manager.go: add DebugSnapshot() to safely expose ms fields under lock; callers no longer acquire ms.Mu directly (deadlock guard) - main.go: HandleDebugSession uses DebugSnapshot + NextSeqPeek (no longer reads ms.Worker/ms.Mu directly from outside session pkg) - main.go: remove size*1000 modulo hack — Go 1.26: head>=size always, subtraction is non-negative, extra multiplier is dead code - conn.go: forwardEvents wraps Wait() with 2s timeout goroutine to prevent indefinite block if worker is in a zombie state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d GetManagedSessionDebug managedSession.Worker and managedSession.Mu were exported unnecessarily, allowing external callers to bypass AttachWorker/DetachWorker (pool quota invariant) and violate Manager lock ordering. GetManagedSessionDebug was dead code after DebugSnapshot was added. - managedSession.Worker → managedSession.worker (unexported) - managedSession.Mu → managedSession.mu (unexported) - Remove GetManagedSessionDebug (no callers remain) - All internal references updated; build clean Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RES-008 — per-user max_total_memory_mb: - PoolConfig.MaxMemoryPerUser (int64, default 2 GB) - PoolManager.AcquireMemory/ReleaseMemory/UserMemory methods - workerMemoryEstimate = 512 MB (matches RLIMIT_AS cap) - AttachWorker calls AcquireMemory after slot quota; rollback on failure - DetachWorker calls ReleaseMemory alongside Release - ErrMemoryExceeded sentinel error - 5 new table-driven tests covering limit/unlimited/cross-user/integrated RES-009 — worker crash rate metrics: - WorkerCrashesTotal (counter, labels: worker_type, exit_code) - WorkerMemoryBytes (gauge, labels: worker_type) - ForwardEvents increments crash counter when exit_code != 0 Matrix corrected: EVT-002~006, RES-005 were already PASS; corrected 9 rows and summary: 150/170 PASS (88%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- isReadTimeout, broadcastQueueSize (3 variants), isDroppable - heartbeat: MarkAlive, MarkMissed (under/at-limit/after-stop), MissedCount, Stop idempotency - SeqGen: Next (startsAt 1, increments, independent sessions), Peek (zero unknown, does not increment) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add all remaining AEP v1 event kinds and data types: - Message (complete message), Reasoning, Step, PermissionRequest/Response - MessageData, ReasoningData, StepData, PermissionRequestData, PermissionResponseData - ToolCall, ToolResult, Ping, Pong already existed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace flat CI with layered pipeline: - Layer 1 (Gate): vet + build + lint - fast fail - Layer 2 (Unit Test): per-package matrix with coverage - Layer 3 (Integration): full suite with race detector + Codecov - Layer 4 (Coverage Check): merge profiles, threshold gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SEC-007 (Multi-bot Isolation): - Add BotID field to Conn struct, extracted from JWT claims - Add bot_id mismatch check when joining existing sessions - Add CreateWithBot method on session Manager - Add bot_id column + index to sessions SQLite table SEC-045 (AllowedTools → Worker Proc): - Add AllowedTools to proc.Manager Opts struct - Auto-append --allowed-tools args in Start() via BuildAllowedToolsArgs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove t.Parallel() from TestExpandEnv (mutates global env vars) - Add watcher_test.go (config coverage 33.3% → 77.8%) - Add events_test.go (events coverage → 100%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update scannerMaxSize from 1MB to 10MB and matching AEP-008 specification requirement. Worker stdout lines exceeding 10MB now trigger a bufio.ErrTooLong panic, which is returned as a friendly error message instead. Also updated error message in ReadLine() to reflect the new 10MB limit. Refs: AEP-008
Replace immediate Kill() with graceful Terminate() in state transitions: - Extract 5s timeout to a constant - Use parent context instead of context.Background() - Gracefully send SIGTERM, then escalate to SIGKILL after 5s grace period Also update anti-pollution restart to continue using Kill() (intentional for emergency cases). This implements AEP-021 specification and providing better worker lifecycle management. Refs: AEP-021
Split long ALTER TABLE statements into separate ExecContext calls for better readability. No functional changes.
Ensure SQLite database files generated during development (gateway.db, gateway.db-shm, gateway.db-wal) are ignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove invalid ldflags (buildTime/goVersion) from Dockerfile and Makefile - Fix healthcheck port 9080→9999 in docker-compose.yml - Fix backup database path hotplex.db→gateway.db - Add Prometheus scrape config (configs/prometheus.yml) - Add Grafana provisioning (dashboards, datasources) - Remove non-existent volume mounts (grafana dirs, prometheus.yml) - Remove invalid HOTPLEX_DEV_MODE env var
Reorganize configs directory following infrastructure-as-code best practices: - Move monitoring configs (prometheus, alerts, slo, otel) to configs/monitoring/ - Consolidate Grafana provisioning under configs/monitoring/grafana/ - Remove duplicate grafana-dashboard.json (now dashboard.json in dashboards/) - Update docker-compose.yml volume paths to new locations - Expand README with monitoring stack usage documentation
Standardize database filename to hotplex-worker.db across all files: - Code default (internal/config/config.go) - Config files (config.yaml, env.example) - Docker Compose (backup service) - Scripts (install.sh, quickstart.sh, docker-build.sh, README.md) - Docs (User-Manual, Disaster-Recovery, Admin-API-Design, Config-Reference) - Specs (TRACEABILITY-MATRIX, README) Also corrects legacy hotplex.db references to hotplex-worker.db.
Add comprehensive design document for Python client example module: - Target: third-party developers integrating HotPlex Worker - Architecture: 3-layer (protocol/transport/client) - Examples: quickstart (5min) + advanced (complete) - Tech stack: Python 3.10+, websockets, asyncio - No PyPI release (local package only) Refs: #python-client-design Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add comprehensive Python client example demonstrating AEP v1 protocol usage: Architecture: - 3-layer design: protocol (codec) → transport (connection) → client (session API) - Pure async/await with Python 3.10+ modern type hints - Event-driven callbacks for real-time message handling Features: - quickstart.py: 5-minute getting started guide - advanced.py: Complete example with tool calls, permissions, state management - Full type safety with dataclasses and TypeVar generics - Custom exception hierarchy for clear error classification Components: - protocol.py: NDJSON envelope encoding/decoding (~250 lines) - transport.py: WebSocket connection management (~150 lines) - client.py: High-level session API (~250 lines) - types.py: AEP v1 data models (~150 lines) - exceptions.py: Exception classes (~50 lines) Tech stack: - Python 3.10+ (dataclasses, StrEnum, match/case) - websockets 12.0+ (pure async WebSocket) - asyncio (standard library) No PyPI release (local package only for examples) Design doc: docs/superpowers/specs/2026-04-02-python-client-design.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix issues identified by code review agents: Code Quality: - Remove unused import 're' in protocol.py - Remove duplicate import in advanced.py Efficiency: - Add max_queue_size parameter (default 1000) to prevent unbounded memory growth - Remove redundant _connected flag, derive state from _ws.open - Handle both str and bytes WebSocket messages correctly Issues fixed: - HIGH: Unbounded message queue could cause memory exhaustion - MEDIUM: Missing bytes message handling caused decode errors - MEDIUM: Redundant connection state tracking - LOW: Unused imports Based on code review by simplify agents Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…c API Extract protocol-level code from internal/ to pkg/ so it can be shared by both the gateway server and future Go clients: - pkg/aep/: AEP v1 protocol codec (NDJSON encode/decode, init handshake) - pkg/jwt/: JWT token generation/validation (ES256-only) - internal/aep/: backward-compat re-exports from pkg/aep - internal/gateway/init.go: keeps gateway-internal types (worker.WorkerType) - internal/security/jwt.go: simplified using pkg/jwt internally This establishes the public pkg/ boundary documented in pkg/README.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pkg/aep/codec.go: remove redundant init() block (nowFunc var is sufficient) - pkg/jwt/jwt.go: replace hand-rolled uuid formatting with github.com/google/uuid - internal/security/jwt.go: replace hand-rolled uuid with uuid.New(), remove math/rand fallback (violates security rules) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sts to pkg/ - Delete pkg/jwt/jwt.go: 100% dead code (0 imports in production) - Delete pkg/aep/init.go: 100% dead code (0 imports) - Slim internal/aep/codec.go: 82 lines → 18 lines, keep only 5 re-exports actually used by gateway/worker code (NewID, NewSessionID, EncodeJSON, DecodeLine, Encode) - Move codec tests to pkg/aep/ where the actual implementation lives - Fix Encode/EncodeChunk to set Timestamp if zero (avoids Validate failures) - Fix nowFunc to default to real wall-clock time instead of 0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the internal/aep facade (16 files re-exporting 5 symbols from pkg/aep) with direct imports. Aligns with Go best practices: - pkg/aep: shared AEP v1 protocol code (reusable by future Go clients) - internal: gateway-specific implementations Also removes empty pkg/jwt/ directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- examples: add gateway health and worker count checks to complete.ts - examples: improve quickstart.ts with gateway ready check - scripts: generate-test-token.ts with ES256 JWT token generation - src/client.ts: add error handling for WebSocket connection failures - src/types.ts: add type definitions for gateway responses - docker-compose.yml: fix healthcheck endpoint and timeout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Major changes: - Update module path from 'hotplex-worker' to 'github.com/hotplex/hotplex-worker' for proper Go module proxying and versioning support - Fix critical bug in ensureDBDir: remove flawed sync.Once that ignored subsequent database paths (discovered by 3 parallel code review agents) - Improve normalizePath: gracefully handle missing $HOME in test environments - Refactor Makefile stop target: use GRACE_PERIOD variable instead of magic number - Improve Docker healthcheck: use grep pattern instead of fragile exact string match - Simplify .gitignore: remove excessive decorative comments (75 → 49 lines) - Add comprehensive client SDK documentation for Python, TypeScript, Go, and Java This commit includes import path updates across 54 files and improves code quality based on multi-agent code review findings. Tests: All short tests pass with race detection enabled
- Fix test-integration: remove extraneous dash before -timeout flag - HotPlexClient: use WebSocketHttpHeaders API, apply try-with-resources - InteractiveExample: wrap Scanner in try-with-resources for proper cleanup - QuickStart: remove dead token generation call, clean up unused imports - Event: remove unused JsonProperty import
- Add bridge parameter to newConn for session lifecycle management - Call bridge.StartSession in performInit after session creation so worker starts before init_ack is sent (fixes worker never started) - Remove redundant CREATED→RUNNING transition in performInit; session now stays CREATED until handleInput (prevents running→running error) - handleInput: skip TransitionWithInput when state is RUNNING, only handle IDLE→RUNNING resume case - Fix all newConn call sites to pass nil bridge in tests - Add api_key query param fallback for browser WebSocket clients
- Remove misleading comment from bridge.go (si.AllowedTools is used at line 66) - Add apiKeyQueryParam const for the browser WS query param fallback
StartSession now takes (botID, allowedTools) and calls CreateWithBot internally. performInit delegates session creation to Bridge, removing the redundant sm.CreateWithBot call and fixing the wasted DB write. The worker now correctly receives AllowedTools from the init handshake instead of always getting nil (since it previously used sm.Create(nil)). Also updates: - SessionManager interface: CreateWithBot replaces Create - BridgeProvider interface in admin: updated StartSession signature - All mock/test call sites updated
Replace concrete *Bridge field with SessionStarter interface. nil starter is now a semantic no-op (test mode), not a degraded path. Adds compile-time check: var _ SessionStarter = (*Bridge)(nil)
After Bridge.StartSession creates the session and worker, performInit must fetch the session info before using it. Previously, si was nil after successful StartSession, causing nil pointer dereference when building init_ack (si.State access). Root cause: StartSession was calling sm.CreateWithBot internally, but performInit never fetched the resulting session object. Error path: c.starter.StartSession(...) succeeds → si still points to pre-creation nil value → ack := BuildInitAck(..., si.State, ...) panics Fix: si, err = handler.sm.Get(sessionID) → fetch the session that StartSession created → si.State now valid for init_ack This bug only affects production mode (starter != nil). Test mode (CreateWithBot directly) was already correct. Related: S1049
- Add aggregateNumberedEnv to support ADMIN_TOKEN_1...N and API_KEY_1...N - Fully document 8888 (Gateway) and 9999 (Admin) ports - Refine technical terminology for professionalism (e.g., 机密 -> 安全凭据)
Fixed 5 critical errors discovered during strict review: 1. Type Error: SetupSession Return Type - Wrong: `(*worker.Worker, error)` (pointer to interface) - Fixed: `(worker.Worker, error)` (interface value) - Reason: Go interfaces should not be pointers 2. API Error: PriorityControl Usage - Wrong: `SendToSession(ctx, env, events.PriorityControl)` - Fixed: `env.Priority = events.PriorityControl; SendToSession(ctx, env)` - Reason: SendToSession doesn't accept priority parameter 3. Unused Variable: CreateWithBot result - Wrong: `si, err := b.sm.CreateWithBot(...)` (si unused) - Fixed: `_, err := b.sm.CreateWithBot(...)` - Reason: SessionInfo not needed in SetupSession 4. Sequence Diagram Error: AttachWorker order - Wrong: CreateWithBot → NewWorker → Start → AttachWorker - Fixed: CreateWithBot → NewWorker → AttachWorker → Start - Reason: Worker attached before start in actual code 5. Blocking Analysis Table Error - Same sequence error as #4, reordered rows Impact: - Type error would cause compilation failure - API error would cause runtime panic - Sequence errors would mislead implementers All errors now fixed. Spec ready for implementation. Related: S1056
Reorganize documentation structure: - Move design specs from docs/superpowers/specs/ to docs/specs/ - Rename specs with descriptive names (e.g., 2026-03-30-foo → Foo-Design.md) - Add YAML frontmatter to all spec documents with standardized metadata - Update README to reflect new directory structure Code improvements alongside reorganization: - Fix Makefile start target to include -config flag - Refactor nextjs-chat components into separate files - Simplify ai-sdk-transport route-handler logic This consolidates all specifications under a single directory with consistent metadata for better discoverability and tracking.
Validation: - Add ACPX-Validation-Report.md with 98% confidence validation results - Test with acpx v0.4.0 CLI: JSON-RPC 2.0, streaming, tool calls, resume - Verify protocol format 100%, initialization 100%, events 100%, tools 95% Metadata updates: - Worker-ACPX-Spec.md: status → review, progress → 30%, confidence → 98% - Go-Client-Example-Design.md: status → implemented, progress → 100% - specs/README.md: Update document states and reorganize categories Tooling: - Add validate-acpx-spec.sh script for automated validation - Update scripts/README.md with validation script documentation Refs: docs/specs/Worker-ACPX-Spec.md, docs/specs/ACPX-Validation-Report.md
…e support
Worker Specs:
- Add Worker-OpenCode-CLI-Spec.md - OpenCode CLI integration specification
- Add Worker-OpenCode-Server-Spec.md - OpenCode Server integration specification
- Both specs marked as implemented (100% progress)
- Update specs/README.md with new OpenCode worker entries
Implementation:
- Add SendUserMessage() to base.Conn for Claude Code's native format
- Update claudecode.Worker.Input() to use user message format
- Fallback to AEP envelope for mock connections in tests
- This aligns with Claude Code's actual stream-json input format
Technical Details:
- SendUserMessage sends {"type":"user","message":{"type":"user","content":[{"type":"text","text":"..."}]}}
- This is the correct format for Claude Code's stdin protocol
- Maintains backward compatibility with test mocks
Refs: docs/specs/Worker-OpenCode-CLI-Spec.md, docs/specs/Worker-OpenCode-Server-Spec.md
- Run gofmt -s -w . to fix all formatting issues - Use embedded BaseWorker fields directly (QF1008) - Remove unnecessary BaseWorker selector in claudecode worker This fixes golangci-lint warnings without disabling checks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #1 - Specs 目录重组、ACPX Spec 验证和 Gateway 异步初始化修复
概述
本 PR 整合并标准化了 specs 目录结构,通过 acpx CLI 验证了 ACPX Worker 集成规格,并修复了 Gateway 异步初始化 spec 中的关键错误。
主要变更
1. 📂 Specs 目录重组和标准化
问题: specs 目录结构混乱,缺少统一的 metadata 标准
解决方案:
type,status,progress,estimated_hours字段docs/specs/README.md索引,按状态和类型分类影响: 15+ spec 文档现在具有一致的 metadata,便于追踪和管理
2. ✅ ACPX Spec 验证 (98% 置信度)
问题:
Worker-ACPX-Spec.md基于 acpx CLI v0.4.0 文档编写,未经实际验证解决方案:
ACPX-Validation-Report.md详细报告validate-acpx-spec.sh自动化验证脚本验证方法:
```bash
基础协议测试
acpx --format json claude "What is 2+2?"
工具调用测试
acpx --format json claude "List files in current directory"
Resume 流程测试
acpx claude sessions new --name test-resume
echo "My favorite number is 42" | acpx claude -s test-resume
echo "What is my favorite number?" | acpx claude -s test-resume
```
验证结果:
3. 🔧 Gateway Async Init Spec 修复
问题: Gateway 异步初始化 spec 中存在关键错误
修复内容:
SendToSession方法签名使用sessionInfo变量新增工具
scripts/validate-acpx-spec.sh用途: 自动化验证 ACPX spec 与实际 acpx CLI 的一致性
功能:
使用:
```bash
./scripts/validate-acpx-spec.sh
```
文档变更
新增文档
docs/specs/ACPX-Validation-Report.md- 完整的 ACPX spec 验证报告更新文档
docs/specs/Worker-ACPX-Spec.md- 更新 metadata,添加验证报告链接docs/specs/Gateway-Async-Init-Spec.md- 修复关键错误docs/specs/Go-Client-Example-Design.md- 更新状态为 implementeddocs/specs/README.md- 重组索引和分类scripts/README.md- 添加验证脚本文档Metadata 更新
测试
验证测试
自动化测试
scripts/validate-acpx-spec.sh- 快速验证脚本Checklist
相关文档
docs/specs/Worker-ACPX-Spec.mddocs/specs/ACPX-Validation-Report.mdscripts/validate-acpx-spec.sh后续工作