Skip to content

Per-user token auth for Vault #18

@jh-lee-cryptolab

Description

@jh-lee-cryptolab

Problem

Vault currently uses a single shared token (VAULT_TOKENS env var) for the entire team. Any team member (or compromised agent) with the token has full access to all decryption operations. There is no way to:

  • Identify which user made a request
  • Limit a user's access scope
  • Revoke a single user's access without rotating the shared token

User Flows

Flow 1: Initial Setup (Fresh Install)

install.sh creates default role/token config files, adds the user to the docker group, and installs the runevault CLI alias.

install.sh
  ├── create vault-roles.yml (default roles: admin, agent)
  ├── create vault-tokens.yml (empty: tokens: [])
  ├── usermod -aG docker $SUDO_USER
  ├── add 'runevault' alias to user's shell profile
  ├── configure docker-compose volumes
  └── print instructions: "Run 'runevault token issue' to create your first token"
# After install, admin issues the first token:
$ runevault token issue --user alice --role agent --expires 90d

The admin's auth is SSH access to the host. No admin token — if you can SSH in and run runevault, you are the admin.

Flow 2: Issue Token for Team Member

$ runevault token issue --user alice --role agent --expires 90d

Token issued for 'alice':
  Role:    agent
  Scope:   get_public_key, decrypt_scores, decrypt_metadata
  Top-K:   5
  Expires: 2026-06-18

  Token: evt_7f3a9c...

  ⚠ This token will NOT be shown again. Share it securely with the user.

What happens internally:

  1. runevault alias runs docker exec rune-vault python /app/vault_admin.py token issue ...
  2. vault_admin.py inside the container sends HTTP POST to internal unix socket
  3. Vault admin server validates request, generates evt_ + secrets.token_hex(16)
  4. Updates in-memory token store immediately
  5. Async writes updated state to vault-tokens.yml
  6. Returns token to CLI — printed once, not stored in logs

Flow 3: Team Member Configures Client

After receiving token from admin:

# In their Claude Desktop client config
{
  "servers": {
    "rune-vault": {
      "url": "https://vault.example.com:50051",
      "token": "evt_7f3a9c..."
    }
  }
}

No change to the client-side UX — same single token field.

Flow 4: API Request with Per-User Token

Client (alice) → gRPC: DecryptScores(token=evt_7f3a9c..., top_k=3)
  │
  ├── validate_token()
  │     ├── lookup token → found: user=alice, role=agent
  │     ├── check expiry → OK (2026-06-18)
  │     ├── check top_k → request(3) ≤ role_limit(5) → OK
  │     └── check rate_limit → 12/30 in window → OK
  │
  ├── derive agent_id from user ("alice") instead of token hash
  │     └── agent_dek = HMAC-SHA256(master_key, "alice")
  │
  └── return DecryptScoresResponse(results=[...])

Flow 5: Rate Limit Hit

Client (alice) → 31st request in 60s window
  │
  ├── validate_token()
  │     ├── lookup token → found: user=alice, role=agent
  │     ├── check rate_limit → agent: 30/60s
  │     └── ✗ 31 > 30 in current window
  │
  └── return RESOURCE_EXHAUSTED: "Rate limit exceeded. Retry after 23s"

Flow 6: Token Expiry

Client (alice, expired) → gRPC: GetPublicKey(token=evt_7f3a9c...)
  │
  ├── validate_token()
  │     ├── lookup token → found: user=alice
  │     └── ✗ expires=2026-06-18, now=2026-07-01
  │
  └── return UNAUTHENTICATED: "Token expired for user 'alice'"

Admin must reissue:

$ runevault token revoke --user alice
$ runevault token issue --user alice --role agent --expires 90d

Flow 7: Revoke Token (Team Member Leaves)

$ runevault token revoke --user alice

Revoked token for 'alice'.
# Takes effect immediately — no restart needed.

What happens:

  1. docker exec runs vault_admin.py which sends HTTP DELETE to internal unix socket
  2. Vault removes alice from in-memory token store immediately
  3. Async writes updated state to vault-tokens.yml
  4. Alice's agent_id derived keys remain valid for previously encrypted metadata (data doesn't become inaccessible — just no new operations)

Flow 8: List Tokens

$ runevault token list

USER      ROLE       TOP_K  RATE     EXPIRES
alice     agent       5      30/60s   2026-06-18
bob       agent       5      30/60s   2026-09-01

Token values are never shown in list output.

Flow 9: Role Management

Roles are managed via the runevault role subcommand.

# List roles
$ runevault role list

ROLE       SCOPE                                              TOP_K  RATE
admin      get_public_key,decrypt_scores,decrypt_metadata      10     60/60s
agent      get_public_key,decrypt_scores,decrypt_metadata       5     30/60s

# Create a custom role
$ runevault role create --name researcher --scope get_public_key,decrypt_scores --top-k 3 --rate-limit 10/60s

Role 'researcher' created.

# Update a role
$ runevault role update --name agent --top-k 8

Role 'agent' updated. Changes take effect immediately for all tokens with this role.

# Delete a role
$ runevault role delete --name researcher

Role 'researcher' deleted.
⚠ Tokens assigned to this role will fail validation until reassigned.

What happens internally:

  1. vault_admin.py sends HTTP request to internal unix socket
  2. Admin server updates in-memory role store immediately
  3. Async writes updated state to vault-roles.yml
  4. Role changes take effect immediately for all tokens assigned to that role

Default roles (admin, agent) are created at install time and can be modified but not deleted.


Design Decisions

Memory-first token and role management

Both token and role changes take effect immediately in memory. The files (vault-tokens.yml, vault-roles.yml) serve as SSOT for startup/recovery, but runtime changes flow through the Admin HTTP API → in-memory store → async file persist. This eliminates the need for container restarts, which is critical for security incident response (immediate revocation).

Admin API: container-internal unix socket

The admin HTTP server listens on a unix socket inside the container only (/var/run/vault-admin.sock). It is NOT mounted to the host. Access path:

SSH → host shell → runevault alias → docker exec → vault_admin.py → curl → unix socket (internal)

Why no admin token:

  • SSH access to the host is the authentication boundary
  • docker exec requires docker group membership (set up by install.sh)
  • The unix socket is not exposed outside the container
  • Three layers of protection (SSH + docker group + container isolation) make an admin token redundant

runevault CLI alias

install.sh adds a bash alias to the admin user's shell profile:

alias runevault='docker exec rune-vault python /app/vault_admin.py'
  • No host Python dependency — Python runs inside the container
  • No host-side files to manage — vault_admin.py lives in the container image
  • docker exec requires docker group membership, which install.sh configures via usermod -aG docker $SUDO_USER

Config files as SSOT for persistence

Token and role storage uses YAML config files (vault-tokens.yml, vault-roles.yml), not SQLite or other DB. Reasons:

  • Human-readable, diff-able, git-trackable (with token values excluded)
  • Docker-native (volume mount)
  • Vault stays as stateless as possible — only FHE keys and these configs are persistent state
  • Token count is small (<50), change frequency is low (monthly)
  • Files are loaded at startup to populate in-memory stores; async-written after each change

Two default roles: admin and agent

Rune plugins need encrypt + decrypt capabilities together — the plugin captures organizational context (encrypt with public key) and retrieves it (decrypt scores + metadata). An "encrypt-only" or "score-only" role has no practical use case. The meaningful access boundary is:

  • admin: System management + all Vault operations
  • agent: Standard Vault operations (all 3 gRPC methods, with lower top_k and rate limits)
  • Custom roles can be created via runevault role create for specific needs

No token hashing at rest

Tokens are stored as plaintext in the config file. File permissions (600) protect them at rest. TLS protects them in transit. The Vault host is assumed to be an admin-only zone — if an attacker can read the config file, they already have host-level access.

Agent ID derived from username, not token

Currently agent_id = sha256(token)[:32]. After this change, agent_id = sha256(username)[:32]. This means:

  • Token rotation doesn't change agent_id (metadata DEK stays consistent)
  • User identity is stable across token reissues

Architecture

Vault host machine
├── vault-roles.yml                  ← async-persisted by server
├── vault-tokens.yml                 ← async-persisted by server
├── .env                             ← TLS config (mode 600)
├── docker-compose.yml
└── rune-vault container
    ├── /app/vault_admin.py          ← CLI, called via docker exec
    ├── /app/vault-roles.yml         ← volume mount (read-write)
    ├── /app/vault-tokens.yml        ← volume mount (read-write)
    ├── /var/run/vault-admin.sock    ← internal unix socket (NOT mounted)
    ├── gRPC server (0.0.0.0:50051)  ← client-facing
    └── Admin HTTP server            ← unix socket only (container-internal)
        ├── POST /tokens             (issue)
        ├── DELETE /tokens/{user}    (revoke)
        ├── GET /tokens              (list)
        ├── POST /roles              (create)
        ├── PUT /roles/{name}        (update)
        ├── DELETE /roles/{name}     (delete)
        └── GET /roles               (list)

Admin access flow:

Admin (SSH) → runevault token issue --user alice --role agent
  → docker exec rune-vault python /app/vault_admin.py token issue --user alice --role agent
  → vault_admin.py: curl --unix-socket /var/run/vault-admin.sock POST /tokens {...}
  → Admin HTTP handler: generate token, update memory, async persist
  → return token to stdout

Config File Formats

vault-roles.yml

roles:
  admin:
    scope: [get_public_key, decrypt_scores, decrypt_metadata, manage_tokens]
    top_k: 10
    rate_limit: 60/60s
  agent:
    scope: [get_public_key, decrypt_scores, decrypt_metadata]
    top_k: 5
    rate_limit: 30/60s

vault-tokens.yml

tokens:
  - user: alice
    token: evt_7f3a9c1e2b4d6f8a0c2e4b6d8f0a1c2e
    role: agent
    created: 2026-03-20
    expires: 2026-06-18
  - user: bob
    token: evt_def456789abc012def456789abc012de
    role: agent
    created: 2026-03-20
    expires: 2026-09-01

Requirements

runevault CLI setup

  • install.sh: usermod -aG docker $SUDO_USER for docker group access
  • install.sh: Add alias runevault='docker exec rune-vault python /app/vault_admin.py' to user's shell profile (~/.bashrc or ~/.zshrc)
  • Print post-install instructions confirming runevault command availability

In-container admin utility (vault_admin.py)

Token management:

  • runevault token issue --user <name> --role <role> [--expires <duration>]
    • Sends HTTP POST via internal unix socket
    • Prints issued token once
  • runevault token revoke --user <name>
    • Sends HTTP DELETE via internal unix socket
    • Takes effect immediately (no restart)
  • runevault token list
    • Sends HTTP GET via internal unix socket
    • Displays user, role, top_k, rate_limit, expiry (not token values)

Role management:

  • runevault role list

    • Displays all roles with scope, top_k, rate_limit
  • runevault role create --name <name> --scope <scopes> --top-k <n> --rate-limit <rate>

    • Creates a new role
  • runevault role update --name <name> [--scope <scopes>] [--top-k <n>] [--rate-limit <rate>]

    • Updates an existing role; changes take effect immediately for assigned tokens
  • runevault role delete --name <name>

    • Deletes a role; warns about tokens that will lose their role
  • Python CLI using argparse + urllib with unix socket support

Vault server-side: Admin HTTP API

  • HTTP server on internal unix socket (/var/run/vault-admin.sock)

    Token endpoints:

    • POST /tokens — issue new token
    • DELETE /tokens/{user} — revoke token
    • GET /tokens — list tokens (no token values)

    Role endpoints:

    • POST /roles — create new role
    • PUT /roles/{name} — update existing role
    • DELETE /roles/{name} — delete role (reject if default role)
    • GET /roles — list all roles
  • No admin token required — access is protected by SSH + docker group + container isolation

  • In-memory token and role stores with async file persistence

  • Python stdlib http.server based (no external dependencies)

Vault server-side: Auth changes

  • Config file loader: read and validate both vault-roles.yml and vault-tokens.yml at startup → populate in-memory stores
  • Replace VAULT_TOKENS env var with in-memory token store as token source
  • validate_token() checks: token lookup → expiry → top_k → rate_limit
  • Per-user rate limiting (keyed by username, not token string)
  • Agent ID derived from username: sha256(username)[:32]
  • Pass user identity to gRPC context for monitoring/audit
  • Backward compatibility: if VAULT_TOKENS env var exists and no config files, use legacy mode with deprecation warning

Docker Compose integration

  • Mount vault-tokens.yml as read-write volume
  • Mount vault-roles.yml as read-write volume
  • Remove VAULT_TOKENS env var from .env.example and docker-compose.yml

install.sh integration

  • Create default vault-roles.yml with admin/agent roles
  • Create empty vault-tokens.yml (tokens: [])
  • usermod -aG docker $SUDO_USER for passwordless docker access
  • Add runevault alias to user's shell profile (auto-detect bash/zsh)
  • Print post-install instructions for issuing first token

Per-user token value

  • Individual user identification for audit logging (Structured audit logging for Vault operations #19)
  • Individual token revocation without affecting other users
  • Per-user rate limiting (keyed by username)
  • Per-role top_k limits (agent: 5, admin: 10)
  • Per-role rate limits (agent: 30/60s, admin: 60/60s)

gRPC error codes

Condition gRPC Status Detail
Token not found UNAUTHENTICATED Invalid authentication token
Token expired UNAUTHENTICATED Token expired for user '<name>'
Rate limited RESOURCE_EXHAUSTED Rate limit exceeded. Retry after <n>s
top_k exceeded INVALID_ARGUMENT top_k <n> exceeds limit <max> for role '<role>'

Affected Files

  • New: vault/vault_admin.py — in-container admin CLI utility
  • New: vault/admin_server.py — Admin HTTP server (internal unix socket, token + role endpoints)
  • Modify: vault/vault_core.pyvalidate_token(), in-memory token/role stores, agent_id derivation
  • Modify: vault/vault_grpc_server.py — user identity in context, startup integration with admin server
  • Modify: vault/Dockerfile — include vault_admin.py and admin_server.py
  • Modify: vault/monitoring.py — per-user metrics labels
  • Modify: vault/docker-compose.yml — volume mounts (read-write for both configs), remove VAULT_TOKENS
  • Modify: vault/.env.example — remove VAULT_TOKENS
  • Modify: install.sh — docker group setup, runevault alias, generate config files
  • Modify: tests/unit/test_auth.py — per-user token, expiry, rate limit, role CRUD tests

Priority

High — Limits blast radius when a single user's agent is compromised via prompt injection.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    securitySecurity improvementsvaultRune-Vault related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions