Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
c490c00
feat: complete P0 milestone — EventStore, Metrics, Tests, Worker limi…
Mar 31, 2026
2bf03a8
docs: update TRACEABILITY-MATRIX after P0 completion — 101→125 PASS, …
Mar 31, 2026
c7fd197
feat: add OTel distributed tracing + config hot reload
Mar 31, 2026
2613487
feat: add security package implementations + SPEC documentation
Mar 31, 2026
9dd399a
feat: add session.event_store_enabled config field (replaces HOTPLEX_…
Mar 31, 2026
d3a62fd
👷 ci: upgrade GitHub Actions CI with lint, concurrency and Codecov
Mar 31, 2026
54a1fb3
feat: enrich init protocol with auth token, config fields and error c…
Mar 31, 2026
d74b5c8
refactor: reorganize .agent/rules into modular topic-specific files
Mar 31, 2026
a145077
chore: remove stale architecture review and scan reports
Mar 31, 2026
ef20dcf
docs: update Message-Persistence with v1.0 deprecation decision
Mar 31, 2026
383ae65
docs: update AGENT.md with new modular rules structure
Mar 31, 2026
2ab5f25
feat: wire JWT secret via config, add codecov config and init validat…
Mar 31, 2026
bf10759
fix: change PoolManager.Acquire return type to error
Mar 31, 2026
c869a1a
fix: use errors.As for safe type assertion in AttachWorker
Mar 31, 2026
5f0ccb8
🔧 ci: update codecov targets to match module boundaries
Mar 31, 2026
c125618
docs: update TRACEABILITY-MATRIX to reflect actual implementation state
Mar 31, 2026
7ffb310
feat: phase 1-3 robustness and admin API — 141/157 ACs (90%)
Mar 31, 2026
88d260d
fix: eliminate seq mutation, deadlock risk, goroutine hang in robustn…
Mar 31, 2026
d2cff71
fix: unexport managedSession.worker and managedSession.mu, remove dea…
Mar 31, 2026
fb12faf
feat: RES-008 per-user memory tracking and RES-009 crash rate metrics
Mar 31, 2026
08db5cb
feat: TEST-006 gateway coverage — hub_test.go 190 lines
Mar 31, 2026
ecf414f
feat: add essential open-source files and refine gateway/session logic
Mar 31, 2026
d6ded7c
style: rename docs/SPECS to docs/specs and update references
Mar 31, 2026
c86375b
feat: AEP-012 Message Event Kind + AEP-011 missing event kinds
Mar 31, 2026
a7e0ede
feat: TEST-007 CI/CD 4-layer test pipeline
Mar 31, 2026
449bf99
feat: SEC-007 multi-bot isolation + SEC-045 allowed-tools passthrough
Mar 31, 2026
ba2c3f9
fix: config test race + add watcher/events tests
Mar 31, 2026
11e14f8
feat(proc): enforce 10MB worker output limit
Mar 31, 2026
a0591ac
feat(session): implement graceful worker termination
Mar 31, 2026
3162b46
refactor(session): improve schema migration formatting
Mar 31, 2026
dc910ce
fix(session): add Terminate mock expectations + store tests
Mar 31, 2026
503f3e1
✅ test: add JWT validator and gateway connection test suites
Mar 31, 2026
05f0898
✅ test: boost gateway coverage 14.7%→36.0% with Hub/Conn tests
Mar 31, 2026
2778317
📝 docs: extract Claude Code Worker specs to separate document
Mar 31, 2026
6679773
🐛 fix(ci): raise coverage threshold from 50% to 80% (TEST-007)
Mar 31, 2026
7994bf3
🟣 feat(gateway): implement AEP-011/AEP-012 event pass-through
Mar 31, 2026
ec87321
🟣 feat(security): extract botID from JWT at HTTP upgrade time (SEC-007)
Mar 31, 2026
18a8dd4
🐛 fix(security): generate ES256 tokens even with []byte HMAC test sec…
Mar 31, 2026
30fb809
🔧 chore: wire JWTValidator into NewAuthenticator and fix noop Resume
Mar 31, 2026
ac433fe
✅ test: add gateway handler, proc manager, worker registry and noop t…
Mar 31, 2026
27025fc
✅ test(gateway): add botID isolation, bridge forwarding and HandleHTT…
Mar 31, 2026
781f758
✨ feat(gateway): AEP-011/012 passthrough + SEC-007 bot_id isolation +…
Mar 31, 2026
bdbf49c
refactor: extract BaseWorker, split conn.go/store.go, move AdminAPI t…
Apr 1, 2026
fa500c6
refactor: simplify gateway/admin, fix claudecode deadlock and data race
Apr 1, 2026
a8a066a
feat(claudecode): complete CLI params, env whitelist, protocol layers…
Apr 1, 2026
e448262
✅ test(claudecode): mock-based unit tests, coverage 55%→70%
Apr 2, 2026
f318886
refactor(claudecode): unify WorkerEvent routing, DRY control protocol…
Apr 2, 2026
50b03e6
test: AC验收矩阵v1.1更新 + lint errcheck修复
Apr 2, 2026
f81fe95
chore: rename cmd/gateway → cmd/worker, binary → hotplex-worker
Apr 2, 2026
ad5c71e
chore(Makefile): add PGO, cross-compile, ldflags, lint-fix, fmt, tidy
Apr 2, 2026
be6496c
chore(Makefile): auto-detect platform, simplify build targets
Apr 2, 2026
88e4e3e
chore(Makefile): make defaults to help
Apr 2, 2026
0e5c686
feat(deploy): add installation scripts and Docker deployment files
Apr 2, 2026
0611cac
feat(scripts): production installation with security hardening
Apr 2, 2026
2db57a5
feat(config): change default ports to 8888/9999
Apr 2, 2026
6ae5baf
feat(config): add complete default configuration files
Apr 2, 2026
5cc4b1f
refactor(makefile): organize by functional groups with process lifecycle
Apr 2, 2026
f87a23d
chore(.gitignore): add gateway.db* pattern
Apr 2, 2026
9d67437
fix(docker): correct docker configs and add monitoring provisioning
Apr 2, 2026
fa218e3
reorganize(configs): group monitoring configs under monitoring/
Apr 2, 2026
ea7945e
rename(db): gateway.db → hotplex-worker.db
Apr 2, 2026
5fde664
docs(spec): add Python client design document
Apr 2, 2026
af07189
feat(examples): add Python client example module
Apr 2, 2026
08dee42
refactor(python-client): fix code review issues
Apr 2, 2026
7149c3e
refactor(go-client): migrate AEP codec and JWT core to pkg/ for publi…
Apr 2, 2026
4f25572
fix: remove redundant code and use stdlib uuid
Apr 2, 2026
ea6548f
refactor(dead-code): remove pkg/jwt, slim internal/aep, move codec te…
Apr 2, 2026
77b3922
refactor: eliminate internal/aep facade, import pkg/aep directly
Apr 2, 2026
b9be26f
✨ feat(examples): improve TypeScript client examples and healthcheck
Apr 2, 2026
10270ba
docs(spec): add Go client example module design
Apr 2, 2026
ed213a7
refactor: migrate to github.com/hotplex/hotplex-worker module path
Apr 3, 2026
80d06a7
🔧 fix(examples): improve Java client SDK and fix Makefile test target
Apr 3, 2026
7321797
refactor: promote go-client to repository root and update documentati…
Apr 3, 2026
49fcbd5
🔧 fix(gateway): fix AEP init handshake and resolve SESSION_BUSY race
Apr 4, 2026
8ab04f9
🔧 fix(gateway): clean up stale comment and add api_key constant
Apr 4, 2026
ce81b7c
🔧 fix(gateway): eliminate double session creation in performInit
Apr 4, 2026
b90ec2f
🔧 refactor(gateway): extract SessionStarter interface on Conn
Apr 4, 2026
358b7c5
feat: init ai-sdk-transport and nextjs-chat; reorganize DB to data/ dir
Apr 4, 2026
3606645
fix(gateway): fetch session info after StartSession in performInit
Apr 4, 2026
f8c16c0
chore: cleanup project structure and ignore .loki and main binary
Apr 4, 2026
cfda457
feat(config): support numbered env vars and overhaul documentation
Apr 4, 2026
2fb1600
docs(spec): fix critical errors in Gateway Async Init spec
Apr 4, 2026
0a32223
📝 docs: consolidate specs directory and standardize metadata
Apr 4, 2026
226d151
📝 docs(specs): validate ACPX spec via acpx CLI and update metadata
Apr 4, 2026
d62b83a
✨ feat(worker): add OpenCode Worker specs and Claude Code user messag…
Apr 4, 2026
2a7da8f
🎨 style: fix code formatting (gofmt) and use embedded fields
Apr 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions .agent/rules/aep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
paths:
- "**/aep/*.go"
---

# AEP v1 协议规范

> hotplex-worker 对外暴露的统一 WebSocket 全双工通信协议
> 参考文档:`docs/specs/Acceptance-Criteria.md` §AEP-001 ~ §AEP-030

## Envelope 结构
每条 AEP v1 消息必须包含以下字段:

| 字段 | 类型 | 要求 |
|------|------|------|
| `id` | string | non-empty,消息唯一标识 |
| `version` | string | 必须为 `aep/v1`,否则返回 `VERSION_MISMATCH` |
| `session_id` | string | non-empty |
| `seq` | int64 | 从 1 开始严格递增,同 session 内原子分配 |
| `timestamp` | int64 | Unix ms,> 0 |
| `event` | object | non-null,包含 `type` 字段 |
| `priority` | string | 缺失默认为 `data`;`control` 跳过 backpressure |

### 编解码约束
```go
// DecodeLine 必须验证所有必填字段
func DecodeLine(line []byte) (*Envelope, error) {
dec := json.NewDecoder(bytes.NewReader(line))
dec.DisallowUnknownFields() // 拒绝未知字段
// ...
}

// EncodeLine 使用 json.Encoder,避免 []byte→string 复制
func EncodeLine(w io.Writer, env *Envelope) error {
enc := json.NewEncoder(w)
return enc.Encode(env)
}
```

## 消息类型

### C→S(Client → Server)
- `init`:握手,必须是 WS 连接建立后第一帧
- `input`:用户任务,Session 繁忙时硬拒绝
- `control`:terminate / delete
- `ping`:心跳,回复 pong

### S→C(Server → Client)
- `init_ack`:握手响应
- `state`:状态变更(created/running/idle/terminated)
- `message.delta`:流式输出(text/code/image)
- `message`:Turn 结束时完整消息聚合
- `tool_call` / `tool_result`:Tool 调用通知(AUTONOMOUS 模式)
- `done`:Turn 终止符
- `error`:错误通知
- `pong`:ping 响应
- `control`:reconnect / throttle(Server 发起)

## Seq 分配与去重
```go
// hub.NextSeq 原子分配,保证 session 内单调递增
func (g *SeqGen) NextSeq(sessionID string) int64 {
g.mu.Lock()
defer g.mu.Unlock()
n := g.seq[sessionID]
g.seq[sessionID] = n + 1
return n
}

// message.delta 丢弃时不消耗 seq
if !sent {
// seq 不递增,sessionDropped[sessionID] = true
}
```

## Backpressure — 有界通道与 delta 丢弃
```go
// hub.broadcast 通道容量由 broadcastQueueSize 决定(默认 256)
ch := make(chan *Envelope, cfg.BroadcastQueueSize)

func SendToSession(sessionID string, env *Envelope) error {
if env.Event.Type == "message.delta" || env.Event.Type == "raw" {
// 非阻塞 select,通道满时静默丢弃
select {
case ch <- env:
return nil
default:
sessionDropped[sessionID] = true
return nil // 不返回错误
}
}
// 关键事件不可丢弃
ch <- env
return nil
}
```

## 时序约束
- Turn 开始:`state(running)` 必须是第一个 S→C event(seq=1)
- Turn 结束:`done` 必须是最后一个 S→C event
- `error` 必须在 `done` 之前
- `tool_result.tool_call_id` 必须与对应 `tool_call.id` 匹配

## Init 握手
```go
// performInit 必须在 30s 内完成
ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()

// 第一帧类型必须为 init
if env.Event.Type != "init" {
return sendInitError(conn, "PROTOCOL_VIOLATION")
}
```

## Worker 类型映射
| Worker | 事件映射 |
|--------|---------|
| Claude Code | tool_use → tool_call |
| OpenCode CLI | step_start → 提取 sessionID |
| pi-mono (raw stdout) | 每行 stdout → 一条 message.delta |
23 changes: 0 additions & 23 deletions .agent/rules/go125.md

This file was deleted.

43 changes: 0 additions & 43 deletions .agent/rules/go126.md

This file was deleted.

48 changes: 47 additions & 1 deletion .agent/rules/golang-style.md → .agent/rules/golang.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ paths:
- "**/*.go"
---

# Uber Go Style Guide
# Go 编码规范

> 合并自原 `go126.md`(Go 1.26 语言特性)+ `golang-style.md`(Uber Go Style Guide)
> hotplex-worker 使用 Go 1.26,遵循 Uber Go Style Guide + 项目补充规则

---

## 格式化
- 行宽软限制 99 字符
Expand Down Expand Up @@ -50,3 +55,44 @@ paths:
type Option func(*Config)
func WithTimeout(d time.Duration) Option
```

---

## Go 1.26 语言特性

## 默认生效(无需改代码)
- **Green Tea GC**:GC overhead ↓ 10-40%,长驻进程直接受益
- **Swiss Table Maps**:`make(map[K]V)` 使用新哈希表
- **Container-Aware GOMAXPROCS**:自动适配容器 CPU 限制
- **`io.ReadAll`**:2x faster,50% less allocation(Worker 输出流解析受益)
- **`fmt.Errorf`**:分配减少,等价于 `errors.New`

## 推荐使用
- **`log/slog`**:标准库结构化日志,统一使用
- **Generic Interfaces**:Worker 接口类型参数化(如 `Worker[T Event]`)
- **`weak.Value`**:session metadata LRU 缓存
- **`slices.Clone`**:slice 边界复制的标准方式
- **`unique.Make`**:字符串驻留
- **`new(expr)` 增强**:支持表达式初始化值,Worker 配置简化
- **自引用泛型约束**:更灵活的泛型设计

## Goroutine 泄漏检测(高优先级)
- 实验性 profile:`runtime/pprof` 的 `goroutineleak` 类型
- 启用:`GOEXPERIMENT=goroutineleakprofile`
- 预期 **Go 1.27 默认启用**
- **必须确保所有 goroutine 有 shutdown 路径**(ctx cancel / channel close / WaitGroup)

## 构建优化
```bash
go build -pgo=auto ./cmd/gateway # Profile-Guided Optimization
```

## 现代化工具
```bash
go fix ./... # 自动现代化代码(go fix 完全重写,数十个 fixer)
```

## Flight Recorder(生产诊断)
```bash
go tool trace trace.out
```
135 changes: 135 additions & 0 deletions .agent/rules/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
paths:
- "**/metrics/*.go"
---

# 可观测性规范

> Prometheus 指标、OTel Trace、日志格式
> 参考:`docs/specs/Acceptance-Criteria.md` §OBS-001 ~ §OBS-010

## 日志格式(OTel Log Data Model 兼容)

```go
// 所有日志必须为 JSON 格式
log := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelInfo,
}))

slog.Info("session created",
"service.name", "hotplex-gateway",
"session_id", sessionID,
"trace_id", traceID, // 若有 trace context
)
```

### 必填字段
| 字段 | 说明 |
|------|------|
| `timestamp` | ISO 8601 / Unix ms |
| `level` | DEBUG / INFO / WARN / ERROR / FATAL |
| `message` | 人类可读消息 |
| `service.name` | 固定 `hotplex-gateway` |
| `trace_id` | 若有 trace context |

### 日志级别规范
- **ERROR**:全量记录,不采样
- **正常日志**:按配置 `sample_rate` 采样(例:10%)

---

## Prometheus 指标

### 命名规范
```
<app_prefix>_<group>_<metric>_<unit_suffix>
前缀固定 hotplex_
```

### 核心指标(API 层 — RED 方法)
| 指标名 | 类型 | 说明 |
|--------|------|------|
| `hotplex_requests_total` | Counter | 请求总数 |
| `hotplex_request_duration_seconds` | Histogram | 请求延迟 |
| `hotplex_request_errors_total` | Counter | 错误总数,标签:error_code |

### 核心指标(基础设施层 — USE 方法)
| 指标名 | 类型 | 说明 |
|--------|------|------|
| `hotplex_sessions_active` | Gauge | 当前活跃 Session |
| `hotplex_sessions_created_total` | Counter | 累计创建数 |
| `hotplex_worker_crashes_total` | Counter | Worker 崩溃数,标签:worker_type, reason |
| `hotplex_worker_memory_bytes` | Gauge | Worker 内存占用,标签:worker_type |
| `hotplex_errors_total` | Counter | 所有错误,标签:component, error_code |
| `hotplex_worker_duration_seconds` | Histogram | Worker 执行时长 |

### 辅助指标
```go
// Backpressure
hotplex_broadcast_queue_capacity // 容量上限
hotplex_broadcast_queue_depth // 当前深度
hotplex_messages_dropped_total // 丢弃消息数

// Pool
hotplex_pool_total // 全局 pool 总容量
hotplex_pool_used // 已用槽位
hotplex_user_pool_used // per-user pool,标签:user_id
```

---

## OTel Span 创建与上下文注入

### 每个 AEP 事件对应一个 Span
```go
func handleEvent(ctx context.Context, env *aep.Envelope) {
ctx, span := otel.Tracer("hotplex-gateway").Start(ctx, "aep."+env.Event.Type)
defer span.End()

span.SetAttributes(
attribute.String("session_id", env.SessionID),
attribute.Int64("seq", env.Seq),
)

// trace context 注入事件 metadata
if spanCtx := trace.SpanContextFromContext(ctx); spanCtx.IsValid() {
env.Metadata["trace_id"] = spanCtx.TraceID().String()
env.Metadata["span_id"] = spanCtx.SpanID().String()
}

handle(ctx, env) // 下游传播
}
```

### Span 命名格式
```
aep.init → init 握手
aep.input → 用户输入
aep.message.delta → 流式输出片段
aep.done → Turn 结束
aep.error → 错误事件
```

### 尾部采样策略(Tail-based Sampling)
- **ERROR trace**:100% 保留
- **latency > 5s**:优先保留
- **正常 trace**:1% 采样

---

## SLO 定义

| SLO | 指标 | 目标 |
|-----|------|------|
| Session 创建成功率 | `hotplex_sessions_created_total` | ≥ 99.5% |
| P99 延迟 | `hotplex_request_duration_seconds` P99 | < 5s |
| Worker 可用性 | `1 - hotplex_worker_crashes_total/...` | ≥ 99% |
| WAF 准确率 | 拒绝率/总请求 | > 99.9% |

---

## 告警原则
- **症状告警**(非根因告警)
- `HighSessionCreationFailureRate`:持续 5min 失败率 > 1%
- `HighWorkerCrashRate`:崩溃率 > 1%
- `HighLatency`:P99 > 5s 持续 5min
Loading