AetherBus-Tachyon is a high-performance, lightweight message broker designed for the AetherBus ecosystem. It serves as a central routing point for events, ensuring efficient and reliable delivery from producers to consumers.
This project is currently under active development and aims to be a foundational component for building scalable, event-driven architectures.
- High-Performance Routing: Utilizes an Adaptive Radix Tree for fast and efficient topic-based routing, ensuring low-latency message delivery even with a large number of routes.
- Extensible Media Handling: Supports pluggable codecs and compressors to optimize message payloads.
- Codec: Defaulting to
JSONfor structured data. - Compressor: Defaulting to
LZ4for high-speed compression and decompression.
- Codec: Defaulting to
- ZeroMQ Integration: Built on top of ZeroMQ (using
pebbe/zmq4), leveraging its powerful and battle-tested messaging patterns (ROUTER-DEALER, PUB-SUB). - Clean Architecture: Organized with a clear separation of concerns (domain, use case, delivery, repository, media, app runtime) for maintainability and testability.
- Polyglot Runtime Components: Includes a Go broker core, a FastAPI operational/control surface, and a Rust fast-path sidecar scaffold.
- Continuous Integration: Uses GitHub Actions jobs for Go module recovery validation, API gateway tests, Rust crate tests, and drift export workflows.
This repository is organized into runtime components plus shared domain/transport modules:
.
├── cmd/ # Go executables (tachyon broker, benchmark harness, adapters)
├── internal/ # Broker application, domain, delivery, persistence, media internals
├── pkg/ # Reusable public Go packages (client, transport, encoding, errors)
├── config/ # Go configuration loading and tests
├── api/ # protobuf contracts
├── api_gateway/ # FastAPI admin/control-surface service + pytest suite
├── rust/tachyon-fastpath/ # Rust sidecar scaffold for fast-path operations
├── tools/contracts/ # Contract validation utilities
├── scripts/ # Recovery and benchmark helper scripts
├── docs/ # Architecture, protocol, performance, and roadmap documentation
└── .github/workflows/ # CI workflows
GitHub Actions (.github/workflows/go.yml) executes:
go-offline-sanity: offline-safe Go repository and package checks (scripts/go_mod_recovery.sh check)go-full-recovery: online module recovery + full Go build/test (scripts/go_mod_recovery.sh recover)api-gateway-tests: FastAPI gateway tests withpytest -q api_gateway/testsrust-fastpath-tests: Rust sidecar tests withcargo test --lockedpr-healing-drift: pull-request simulation drift artifact exportnightly-healing-drift: scheduled/workflow-dispatch healing drift execution with artifact export
On Debian/Ubuntu, you can install ZeroMQ development libraries with:
sudo apt-get update && sudo apt-get install -y libzmq3-dev-
Clone the repository:
git clone https://github.com/aetherbus/aetherbus-tachyon.git cd aetherbus-tachyon -
Install dependencies:
go mod tidy
-
Run the server:
go run ./cmd/tachyon
The server will start and bind to the addresses specified in the configuration (defaults to tcp://127.0.0.1:5555 for the ROUTER and tcp://127.0.0.1:5556 for the PUB socket).
Optional direct-delivery durability can be enabled with:
WAL_ENABLED=trueWAL_PATH=./data/direct_delivery.wal
When enabled, direct messages that require ACK are appended to an append-only WAL before dispatch, ACK marks entries committed, terminal outcomes are marked dead-lettered, and remaining unfinalized records are replayed when matching consumers reconnect after restart.
Dead-letter records are now materialized in a structured DLQ store at WAL_PATH.dlq, while broker-scheduled replays are written to WAL_PATH.scheduled. Administrative mutations are recorded in a separate append-only audit chain at WAL_PATH.audit, so compliance retention can differ from hot-path dispatch durability. Operators can browse and inspect DLQ entries, then replay or purge them with explicit confirmation and exact target matching so replay cannot silently change the original consumer/topic boundary.
# Browse dead letters
go run ./cmd/tachyon dlq list --consumer worker-1
# Inspect a single record
go run ./cmd/tachyon dlq inspect --id msg-123
# Replay only when the original target is restated exactly
go run ./cmd/tachyon dlq replay --ids msg-123 --target-consumer worker-1 --target-topic orders.created --actor ops@example.com --reason "customer-approved replay" --confirm REPLAY
# Manually quarantine a message into the dead-letter store
go run ./cmd/tachyon dlq dead-letter --id msg-123 --consumer worker-1 --topic orders.created --payload "raw-body" --actor ops@example.com --reason "manual quarantine"
# Purge an acknowledged bad record
go run ./cmd/tachyon dlq purge --ids msg-123 --actor ops@example.com --mutation-reason "retention cleanup" --confirm PURGE
# Query immutable audit history by message, actor, or time window
go run ./cmd/tachyon dlq audit --id msg-123 --actor ops@example.com --start 2026-03-21T00:00:00Z --end 2026-03-22T00:00:00ZThe demo control-surface gateway exposes matching admin endpoints under /api/admin/dlq/* plus audit queries at /api/admin/audit/events. Set ADMIN_TOKEN to require the X-Admin-Token header for browse, inspect, replay, manual dead-letter, purge, and audit requests. Replay and purge responses include requested/replayed-or-purged counts plus per-record failure details.
WAL_PATH.auditis intentionally separate fromWAL_PATH,WAL_PATH.dlq, andWAL_PATH.scheduledso compliance retention can be longer than dispatch/replay retention.- Each audit line stores actor, timestamp, operation, target message IDs, requested reason, prior state, resulting state, the previous record hash, and the current record hash.
- The
prev_hash→hashchain is meant to make offline tampering detectable during export or forensic review; it is not a substitute for WORM/object-lock storage. - Operationally, treat the audit log as append-only, rotate it with retention tooling that preserves line order, and export it to immutable storage when regulatory retention exceeds local disk policy.
Direct-delivery admission control defaults are intentionally conservative and can be tuned with:
MAX_INFLIGHT_PER_CONSUMER(default1024)MAX_PER_TOPIC_QUEUE(default256)MAX_QUEUED_DIRECT(default4096)MAX_GLOBAL_INGRESS(default8192)TENANT_QUOTAS_JSON(optional per-tenant quota overrides)
Example TENANT_QUOTAS_JSON:
{
"tenant-a": { "max_inflight": 256, "max_queued": 2048, "max_ingress": 4096 },
"tenant-b": { "max_queued": 512 }
}Each field is optional per tenant. Any configured positive value overrides broker defaults for that tenant only.
When limits are reached, direct messages are deferred or dropped with explicit broker counters (deferred, throttled, dropped).
This repository may require external Go module resolution to complete full recovery of
go.mod / go.sum and to run go test ./....
To make troubleshooting easier, use the recovery helper:
Use this mode when your environment cannot reach external Go module infrastructure:
bash scripts/go_mod_recovery.sh checkThis mode is useful for:
- validating repository structure
- checking command entrypoints
- running package-level tests for explicitly selected offline-safe packages
By default, it tests:
go test ./cmd/aetherbusUse this mode on a machine or CI runner with module download access:
bash scripts/go_mod_recovery.sh recoverThis runs:
go mod downloadgo mod tidygo build ./...go test ./...
When you need to rehearse policy/ruleset behavior without mutating module files, use
--simulate-policy:
bash scripts/go_mod_recovery.sh --simulate-policy --policy-ruleset pr-healing recoverThis prints the planned recovery commands and exits without running write-capable steps.
To emit trend bundles consumable by dashboard tooling:
bash scripts/go_mod_recovery.sh --drift-export-dir artifacts/recovery recoverThe command writes:
trend_bundle.jsontrend_bundle.csv
Both include UTC timestamp, mode, status, policy-ruleset label, simulation status, and
go.mod/go.sum hashes so nightly and PR healing jobs can track drift over time.
If post-fix verification (go test ./...) fails, you can opt into a rollback command:
bash scripts/go_mod_recovery.sh \
--auto-rollback \
--auto-rollback-cmd "git checkout -- go.mod go.sum" \
recoverRollback execution is disabled by default and only runs when explicitly enabled.
To inspect the current Go environment:
bash scripts/go_mod_recovery.sh doctorSome failures are caused by local source issues, while others are caused by incomplete
module metadata (go.sum) that cannot be repaired without downloading or verifying
dependencies.
In restricted-network environments, the offline-safe path helps confirm whether a failure is local to the codebase or caused by module resolution limits.
If recover fails with module download/verification errors in restricted environments,
treat that as an environment limitation first (not an automatic source regression).
A first-class benchmark harness is available via cmd/tachyon-bench:
# direct mode with ACK
go run ./cmd/tachyon-bench harness --mode direct-ack --payload-class small --compress=true --duration 20s
# fanout benchmark
go run ./cmd/tachyon-bench harness --mode fanout --fanout-subs 8 --payload-class medium --compress=false --duration 20s
# mixed topic distribution
go run ./cmd/tachyon-bench harness --mode mixed --mixed-topics 8 --payload-class medium --compress=true --duration 30s
# CI-friendly matrix
go run ./cmd/tachyon-bench matrix --duration 10s --connections 2The harness reports p50/p95/p99 latency, throughput, CPU usage, memory RSS, and allocations/op. See docs/PERFORMANCE.md for full interpretation guidance and comparison workflow.
เป้าหมาย: ทำให้ภาพสถาปัตยกรรมอ้างอิง “โครงสร้างข้อมูลที่ persist จริง” และ “เส้นทางควบคุมระหว่างโมดูล” เพื่อไม่ปะปนกับรายการงานที่ปิดแล้ว
erDiagram
ROUTE_CATALOG_SNAPSHOT {
int version PK
}
ROUTE_ENTRY {
string pattern PK
string destination_id PK
string route_type
int priority
bool enabled
string tenant
}
SESSION_SNAPSHOT {
string session_id PK
string consumer_id PK
string tenant_id
string subscriptions_json
datetime last_heartbeat
int max_inflight
bool supports_ack
bool resumable
}
WAL_RECORD {
string message_id PK
string type
string consumer_id
string session_id
string tenant_id
string topic
uint64 enqueue_sequence
int attempt
}
SCHEDULED_MESSAGE {
uint64 sequence PK
string message_id
string tenant_id
string topic
string destination_id
string route_type
uint64 enqueue_sequence
int delivery_attempt
datetime deliver_at
string reason
}
DLQ_RECORD {
string message_id PK
string consumer_id
string session_id
string tenant_id
string topic
uint64 enqueue_sequence
int attempt
string reason
datetime dead_lettered_at
int replay_count
}
AUDIT_EVENT {
string event_id PK
string actor
datetime timestamp
string operation
string target_message_ids_json
string requested_reason
string prev_hash
string hash
}
ROUTE_CATALOG_SNAPSHOT ||--o{ ROUTE_ENTRY : contains
SESSION_SNAPSHOT }o--o{ WAL_RECORD : replay_join
WAL_RECORD ||--o{ SCHEDULED_MESSAGE : retry_schedule
WAL_RECORD ||--o| DLQ_RECORD : terminal_failure
DLQ_RECORD ||--o{ AUDIT_EVENT : mutation_trail
flowchart LR
U[Voice / Intent / App Request] --> G[Genesis
Intent -> Visual Plan]
G --> M[Manifest
Intent+Visual+Scene Contract]
E[Environment Sensing] --> B[BioVision
Day/Night/Fog/Rain/Motion]
B --> GV[Governor
Brightness/Curfew/Geo-fence]
B --> P[PRGX
Policy+Safety Gate+Audit]
M --> GV
M --> P
GV --> T[Tachyon Runtime
Realtime Stream + Time Sync]
P --> T
T --> X[Edge/WASM Runtime]
X --> O1[AR/VR Glasses]
X --> O2[Projector / Building Facade]
X --> O3[Screen / Legacy OS Surface]
- Producers publish multipart frames to the ZeroMQ ROUTER.
delivery/zmq.Routervalidates frame shape, decompresses/decodes payloads via the media layer, and forwards routing work into the application flow.usecase.EventRouterresolves topic matches through the route store (ART) for fanout delivery.- Consumer registration and heartbeat traffic updates the consumer session table, which tracks active direct-delivery capability.
- Direct deliveries create or update inflight delivery records so ACK/NACK, retry, timeout, and dead-letter behavior can be evaluated.
- When ACK durability is required, the broker appends dispatch state to segmented WAL files, snapshots resumable sessions, and persists scheduled retries for restart recovery.
- Terminal failures are materialized into the DLQ store, while replay/purge/manual dead-letter mutations are chained into the admin audit log for forensic review.
- The transport layer emits the final topic payload or direct-delivery frame back to subscribers / workers.
This version of the diagram is aligned with the current logical storage model described below, so the architecture view now reflects both the runtime components and the broker-managed data structures.
The broker currently uses a hybrid in-memory + append-only WAL model instead of a full relational database. The logical data structures are:
- Purpose: topic-to-destination lookup for routing decisions
- Shape: adaptive radix tree in memory plus a versioned JSON route catalog on disk
- Lifecycle: loaded from
ROUTE_CATALOG_PATHon startup, mutated in memory during runtime, persisted after route changes
| Field | Type | Description |
|---|---|---|
topic |
string | Topic key used for route lookup |
destination |
string | Target consumer/node identifier |
- Purpose: active consumer capability/session tracking for direct delivery
- Shape: map keyed by
consumer_id - Lifecycle: active state lives in memory; resumable metadata can be restored from WAL-backed session snapshots
| Field | Type | Description |
|---|---|---|
consumer_id |
string | Stable consumer identity |
session_id |
string | Active session identifier |
socket_identity |
bytes | ZeroMQ ROUTER identity for direct send |
supports_ack |
bool | Whether consumer participates in ACK flow |
subscriptions |
set[string] | Topics subscribed for direct delivery |
max_inflight |
int | Consumer inflight window cap |
inflight_count |
int | Current number of inflight messages |
last_heartbeat |
timestamp | Last heartbeat seen from consumer |
- Purpose: ACK/NACK, retry, timeout, dead-letter control, and delayed delivery scheduling for direct mode
- Shape: maps keyed by
message_idplus an ordered scheduled queue keyed bydeliver_at - Lifecycle: inflight state lives in memory; retry/delayed queue ordering can be restored from WAL-backed scheduled entries
| Field | Type | Description |
|---|---|---|
message_id |
string | Message identity used for ACK/NACK correlation |
consumer_id |
string | Target consumer for this attempt |
session_id |
string | Session that received the dispatch |
topic |
string | Routed topic |
payload |
bytes | Original payload bytes |
attempt |
int | Delivery attempt count |
dispatched_at |
timestamp | Dispatch time used for timeout evaluation |
status |
enum | dispatched / acked / nacked / expired / retry_scheduled / dead_lettered |
- Purpose: durability for direct messages requiring ACK
- Storage: JSON-line append log (default path
./data/direct_delivery.wal) - Recovery: uncommitted dispatch records are replayed when matching consumers re-register
| Field | Type | Description |
|---|---|---|
type |
enum | dispatched, committed, or dead_lettered |
message_id |
string | Message identity |
consumer |
string | Consumer identity for dispatched records |
session_id |
string | Session ID for dispatched records |
topic |
string | Topic for dispatched records |
payload |
bytes | Payload for dispatched records |
attempt |
int | Attempt number for dispatched records |
Note: if you need SQL/NoSQL persistence in the future, this model can be mapped directly to tables/collections (
routes,consumer_sessions,inflight_messages,delivery_wal) while preserving existing runtime semantics.
Guarantees (when WAL_ENABLED=true):
- Direct deliveries that require ACK are written to WAL before broker send.
- ACK and terminal dead-letter outcomes finalize WAL records, preventing replay.
- On restart, only unfinalized direct deliveries are replayed, preserving
message_id,consumer_id, topic, payload, and attempt counter.
Non-goals / current limitations:
- WAL is local append-only file storage (single-node durability, no replication or consensus).
- WAL replay is scoped to consumers that re-register; replay is not global fanout recovery.
- Dispatch WAL compaction/retention is not implemented in this version.
- Audit retention is operator-managed and can be longer than WAL retention because the audit chain is stored separately in
WAL_PATH.audit.
- Geo-redundant Durability: Replicate WAL, route catalog, and delayed queue state to a standby node or object storage target.
- SLO-driven Autoscaling Signals: Emit broker pressure indicators that can feed orchestration or capacity planning automation.
- Geo-redundant Durability: ทำสำเนา WAL, route catalog และสถานะ delayed queue ไปยัง standby node หรือ object storage
- SLO-driven Autoscaling Signals: ปล่อยสัญญาณแรงกดดันของ broker เพื่อนำไปใช้กับระบบ orchestration หรือ automation ด้าน capacity planning
To move AetherBus-Tachyon toward a production-grade broker spec, the repository now defines deeper system contracts in dedicated documents:
- Protocol Specification v1 (draft)
- Routing Semantics (ART)
- Delivery Semantics (ACK/Retry/Backpressure/DLQ)
- Performance Model and Benchmarking
- Rust Fast-path Sidecar Scaffold
- Intent Graph Algorithm Specification
- Intent Core Phase 1 (single-node scaffold)
Direct-delivery ACK tracking supports timeout-driven retries. Configure via:
DELIVERY_TIMEOUT_MS(default:30000)
If an inflight direct message is not ACKed before this timeout, the broker treats it as retryable, retries within the direct retry budget, and dead-letters it once retries are exhausted.
These docs lock down the key areas that must be explicit for production evolution:
- Protocol envelope and control messages (register/ack/nack)
- Topic grammar and wildcard matching precedence
- Delivery guarantees and retry/dead-letter behavior
- Operational model (backpressure, failure handling, observability)
The repository includes a scaffolded Rust sidecar (rust/tachyon-fastpath) and a narrow Go adapter boundary (internal/fastpath).
- Default runtime mode remains Go-only for backward-compatible behavior.
- Rust sidecar is an explicit opt-in integration path for large payload framing/compression offload.
- The first iteration intentionally uses a process boundary (Unix socket sidecar) to minimize risk to broker delivery semantics.
Fast-path sidecar configuration knobs are available for explicit developer testing:
FASTPATH_SIDECAR_ENABLED(defaultfalse)FASTPATH_SOCKET_PATH(default/tmp/tachyon-fastpath.sock)FASTPATH_CUTOVER_BYTES(default262144)FASTPATH_REQUIRE(defaultfalse)FASTPATH_FALLBACK_TO_GO(defaulttrue)
See docs/FASTPATH_SIDECAR.md for architecture, activation criteria, and measurable migration candidates.