Skip to content

feat(support): add support service with WebSockets and Yamux#47

Open
edospadoni wants to merge 13 commits intomainfrom
feature/support-service
Open

feat(support): add support service with WebSockets and Yamux#47
edospadoni wants to merge 13 commits intomainfrom
feature/support-service

Conversation

@edospadoni
Copy link
Member

@edospadoni edospadoni commented Mar 10, 2026

📋 Description

🏗 Support Service — Architecture

How it works

A tunnel client on the customer's system opens a persistent WebSocket to our support service. The connection is multiplexed with yamux — one WebSocket carries many parallel streams. When an operator clicks "Open" in the UI, traffic flows through the tunnel to reach the remote service (web UI, terminal, API) as if it were local.

graph LR
    subgraph Customer System
        TC[tunnel-client<br/>yamux mux] --> WU[Web UI]
        TC --> SA[SSH/API]
        TC --> ETC[...]
    end

    TC ---|WebSocket<br/>single connection| SS

    BR[Browser<br/>operator] --> NG[nginx<br/>proxy]
    NG --> BE[Backend :8080<br/>sessions, auth]
    BE --> SS[Support :8082<br/>tunnels, yamux]
Loading

Session Lifecycle

stateDiagram-v2
    [*] --> pending
    pending --> active : WebSocket established
    active --> closed : operator closes
    active --> grace_period : disconnect
    grace_period --> active : reconnect (same session)
    grace_period --> expired : timeout (30-60s)
Loading

WebSocket + yamux Multiplexing

The tunnel client opens one WebSocket to the support service. On top of it, yamux creates a multiplexed session — like having many TCP connections inside a single one.

WebSocket connection (single, persistent)
|
+-- yamux session
    |
    +-- stream #0  [control]     client sends service manifest (JSON)
    +-- stream #1  [HTTP proxy]  operator browses NethVoice UI
    +-- stream #2  [HTTP proxy]  operator browses another service
    +-- stream #3  [terminal]    operator opens xterm.js shell
    +-- ...        (up to 64 concurrent streams per tunnel)

How it connects:

  1. Tunnel client sends GET /support/api/tunnel with HTTP Basic Auth
  2. Support service upgrades to WebSocket, wraps it as net.Conn
  3. yamux.Server is created over the wrapped connection (keepalive 15s)
  4. Client sends a control stream with a JSON service manifest — the list of reachable services (name, host, port, protocol)
  5. Each proxied request from an operator opens a new yamux stream, forwarded to the target service on the customer system

On disconnect: the tunnel enters a grace period (30-60s). If the client reconnects with its reconnect_token, the same session is reused (no new session created). If the grace expires, the session is closed.


How the UI Proxy works (subdomain)

When an operator clicks a service link (e.g. NethVoice UI), the browser opens a new tab on a dedicated subdomain. Each service gets its own origin, so all the app's absolute paths (/_next/, /api/, /static/) work natively.

1. Frontend: POST /api/support-sessions/:id/proxy-token  {service: "nethvoice-ui"}
   Backend:  generates scoped JWT (session_id + service_name + org_role, 8h TTL)
   Response: {url: "https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.example.com/", token: "ey..."}

2. Browser navigates to: https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.example.com/?token=ey...
   nginx:   matches *.support.* --> rewrites to /support-proxy/* --> backend
   Backend: validates JWT, sets HttpOnly SameSite=Strict cookie, redirects to same URL without ?token=

3. All subsequent requests carry the cookie automatically:
   Browser --> nginx --> Backend (SubdomainProxy) --> Support service --> yamux stream --> Customer system

The ?token= is removed from the URL after the first request (redirect), so it never leaks in logs, referrer headers, or browser history.


How the Web Terminal works (xterm.js)

The terminal needs a WebSocket from the browser, but browsers can't send Authorization headers on WebSocket connections. Solution: one-time ticket exchanged beforehand.

1. Frontend: POST /api/support-sessions/:id/terminal-ticket  (JWT in Authorization header)
   Backend:  generates random ticket, stores in Redis with 30s TTL
   Response: {ticket: "a1b2c3..."}

2. Frontend opens WebSocket: GET /api/support-sessions/:id/terminal?ticket=a1b2c3...
   Backend:  Redis GETDEL (atomic read + delete, single-use)
             validates ticket matches session
             opens raw TCP to support service, sends WebSocket upgrade with X-Session-Token
             hijacks browser connection (http.Hijacker)
             bridges both sides bidirectionally:

   Browser (xterm.js) <--WebSocket--> Backend (TCP bridge) <--WebSocket--> Support <--yamux stream--> PTY on customer system

The tunnel client spawns a PTY (pseudo-terminal) directly on the customer system — no SSH daemon involved. The PTY output is forwarded as raw bytes through the yamux stream back to the browser's xterm.js.

Why TCP hijacking instead of httputil.ReverseProxy?

When the browser opens a WebSocket, it sends an HTTP request with Upgrade: websocket. The server responds with 101 Switching Protocols and from that point the connection is no longer HTTP — it becomes a raw bidirectional byte channel.

httputil.ReverseProxy can't handle this. It's designed for the classic HTTP request/response cycle: read the response from the backend, copy it to the client, close. With a WebSocket there's no "response" to copy — there's a continuous stream of frames in both directions.

Gin (which uses net/http underneath) has the same problem: its ResponseWriter buffers, manages headers, Content-Length... none of which make sense after the 101.

The solution is http.Hijacker: a Go interface that lets you take control of the raw TCP connection from the HTTP server. You're telling Go "I'll handle it from here".

The flow:

  1. Backend receives the WebSocket request from the browser
  2. Opens a direct TCP connection to the support service and sends the same upgrade request
  3. Reads the 101 Switching Protocols from the support service
  4. Calls Hijack() on the browser connection — now it has the raw TCP socket
  5. Sends the 101 to the browser
  6. Two goroutines copy bytes in both directions (io.Copy): browser ↔ support service

No HTTP, no buffering, no overhead. Just bytes flowing through.


Access Patterns & Auth

Who does what Auth mechanism How it works
System → tunnel HTTP Basic Auth system_key:system_secret (SHA256), 3-tier cache (memory → Redis → DB), rate-limited
Operator → session CRUD JWT + RBAC connect:systems permission, standard middleware chain
Operator → web terminal One-time ticket JWT exchanged for 30s Redis ticket → GETDEL on use → WebSocket via TCP hijack
Operator → UI proxy Scoped proxy JWT 8h token with {session_id, service_name, org_role} → SameSite=Strict cookie on subdomain → auto-redirect strips token from URL
Backend → support service Per-session token + INTERNAL_SECRET X-Session-Token (64-char hex, per-session) + shared SUPPORT_INTERNAL_SECRET for service-level auth, constant-time validation

Security Highlights

🔑 No shared secrets Each session gets its own token. Compromising one doesn't affect others
🎫 Terminal ticket 30s TTL, single-use (GETDEL), JWT never touches the URL
🍪 Proxy cookie Token arrives as ?token=, gets stored as HttpOnly SameSite=Strict cookie, URL is cleaned via redirect + Referrer-Policy: no-referrer
Constant-time comparisons crypto/subtle for all token validations — no timing attacks
🛡 SSRF protection Tunnel blocks cloud metadata (169.254.x.x), link-local, multicast, loopback
🖼 Frame protection CSP frame-ancestors 'self' on proxied responses — prevents clickjacking
🔄 Cache invalidation Secret regeneration → Redis pub/sub → flush memory + Redis caches instantly
📝 Audit trail Every operator action logged: who, when, what service, access type

Subdomain Proxy

Each service gets its own browser origin — no URL rewriting needed:

https://nethvoice-ui--c37f4ce78b024a1fb123456789abcdef.support.my.nethesis.it/
        ────────────  ──────────────────────────────── ───────────────────────
        service name  session UUID (no dashes, 32 hex)   configured domain

Requires: DNS wildcard *.support.{domain} + matching wildcard SSL certificate + SUPPORT_PROXY_DOMAIN env var.


Inter-service Communication

Backend ──INTERNAL_SECRET────▶ Support Service    (service-level auth, shared secret)
Backend ──X-Session-Token────▶ Support Service    (per-request, per-session scope)
Backend ──Redis pub/sub──────▶ Support Service    (close commands, cache invalidation)
Support ──yamux stream───────▶ Tunnel Client      (proxied HTTP, terminal)
Support ──WebSocket 4000─────▶ Tunnel Client      (graceful close, no reconnect)

Components & Files

Component Path Purpose
Support service services/support/ WebSocket tunnels, yamux, session DB, service proxy
Backend APIs backend/methods/support_proxy.go Terminal ticket, proxy token, subdomain proxy, session CRUD
Frontend frontend/src/components/support/ Session dashboard, service list, terminal (xterm.js)
Proxy proxy/nginx.conf Subdomain routing, tunnel endpoint exposure
DB schema backend/database/migrations/009_* support_sessions, support_access_logs tables
CI/CD .github/workflows/, render.yaml, deploy.sh Build, test, deploy support service

Related Issue: #[ISSUE_NUMBER]

🚀 Testing Environment

To trigger a fresh deployment of all services in the PR preview environment, comment:

update deploy

To download tunnel-client binary, reference here: #47 (comment)

Automatic PR environments:

✅ Merge Checklist

Code Quality:

  • Backend Tests
  • Collect Tests
  • Sync Tests
  • Frontend Tests

Builds:

  • Backend Build
  • Collect Build
  • Sync Build
  • Frontend Build

… management

Support service (port 8082) for remote support sessions via WebSocket tunnels.
Includes tunnel-client for NS8 and NethSecurity, yamux multiplexer, web terminal,
HTTP proxy, session lifecycle management, rate limiting, and graceful reconnection.
Support session CRUD, WebSocket terminal with one-time tickets, subdomain proxy
with body rewriting, access logging, RBAC with connect:systems permission,
database migrations, and security hardening from penetration test findings.
Support sessions table with pagination and sorting, xterm.js web terminal
with multi-tab support, service dropdown with multi-node grouping,
connect:systems permission guard, and i18n translations.
Add support service routing in nginx proxy, Render.com deployment config,
CI pipeline with tunnel-client Docker image and rolling dev release,
release workflow with tunnel-client binary and SBOM, connect:systems RBAC permission.
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 10, 2026 08:00 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:00 — with Render Active
@github-actions
Copy link
Contributor

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

  • https://my-proxy-qa-pr-47.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-proxy-qa-pr-47.onrender.com/login

These will be automatically removed when the PR is closed or merged.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

🤖 My API structural change detected

Preview documentation

Structural change details

Added (10)

  • DELETE /support-sessions/{id}
  • GET /support-sessions
  • GET /support-sessions/{id}
  • GET /support-sessions/{id}/logs
  • GET /support-sessions/{id}/proxy/{service}/{path}
  • GET /support-sessions/{id}/services
  • GET /support-sessions/{id}/terminal
  • PATCH /support-sessions/{id}/extend
  • POST /support-sessions/{id}/proxy-token
  • POST /support-sessions/{id}/terminal-ticket

Modified (5)

  • GET /systems
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property modified: systems
  • GET /systems/{id}
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • POST /systems
    • Response modified: 201
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • POST /systems/{id}/regenerate-secret
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
  • PUT /systems/{id}
    • Response modified: 200
      • Content type modified: application/json
        • Property modified: data
          • Property added: support_session_id
Powered by Bump.sh

@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:11 — with Render Active
Allows manually created services (not from Blueprint) to be reached
from PR preview environments by setting their env var to a full FQDN.
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 08:52 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:32 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:39 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 09:54 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:07 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:22 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:31 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:39 — with Render Active
@edospadoni edospadoni force-pushed the feature/support-service branch from c62b877 to 007bd6d Compare March 10, 2026 10:53
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 10:54 — with Render Active
…kend

Address 27 findings from security audit: prevent double-close panic with
sync.Once, fix TOCTOU race in session creation with DB transaction, add
gzip bomb protection, limit manifest size/rate, validate service names,
use full session UUID in subdomain proxy, add org_role to proxy tokens,
harden WebSocket origin checks, add session rate limiting, fix concurrent
read/write safety, and multiple other hardening improvements.
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 10, 2026 13:23 — with Render Active
…kend

Address 23 findings from penetration testing report on the support service:
- SSRF/DNS rebinding prevention with IP validation and DNS resolution checks
- Open redirect fix via protocol-relative URL sanitization
- CORS restriction from AllowAllOrigins to localhost-only in debug mode
- HSTS, CSP, X-Content-Type-Options security headers in nginx proxy
- InternalSecret middleware for defense-in-depth inter-service auth
- PTY environment variable sanitization to prevent credential leakage
- Cookie rewriting to prevent cross-session domain leakage
- Global memory budget (50MB) for gzip decompression (bomb mitigation)
- CONNECT protocol newline injection prevention with service name validation
- Container hardening with nginx-unprivileged and non-root users
- Input validation for node_id and service names
- Nginx server_name regex anchoring for multi-environment support
- Rate limiter single-instance design documentation
- Non-functional default secrets in .env.example files
Add pid directive to /tmp/nginx.pid and create writable cache directories
so nginx can run as non-root user without permission errors.
Add https://*.nethesis.it to connect-src so the frontend can reach
the Logto identity provider for OIDC flows.
@edospadoni
Copy link
Member Author

edospadoni commented Mar 11, 2026

tunnel-client binary (linux/amd64)

Download:

Binary: tunnel-client.zip

Quick start

# Make it executable
chmod +x tunnel-client-linux-amd64

# Run it
./tunnel-client-linux-amd64 \
  --url wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel \
  --key <SYSTEM_KEY> \
  --secret <SYSTEM_SECRET>

Parameters

Flag Env var Description
-u, --url SUPPORT_URL WebSocket tunnel URL (required)
-k, --key SYSTEM_KEY System key from registration (required)
-s, --secret SYSTEM_SECRET System secret from registration (required)
-n, --node-id NODE_ID Cluster node ID, auto-detected on NS8
-r, --redis-addr REDIS_ADDR Redis address, auto-detected on NS8
--static-services STATIC_SERVICES Manual service definition: name=host:port[:tls],...
--tls-insecure TLS_INSECURE Skip TLS certificate verification
-c, --config TUNNEL_CONFIG YAML config file for service exclusions
--discovery-interval DISCOVERY_INTERVAL Service re-discovery interval (default 5m)
--reconnect-delay RECONNECT_DELAY Base reconnect delay (default 5s)
--max-reconnect-delay MAX_RECONNECT_DELAY Max reconnect delay (default 5m)

Service discovery modes

The tunnel-client auto-detects the environment:

  • NS8: discovers services from Redis + Traefik routes
  • NethSecurity: discovers services from OpenWrt/nginx config
  • Static: define services manually with --static-services

Environment variables

All flags can also be passed as env vars:

export SUPPORT_URL=wss://my-proxy-qa-pr-47.onrender.com/support/api/tunnel
export SYSTEM_KEY=<your-key>
export SYSTEM_SECRET=<your-secret>
./tunnel-client-linux-amd64

@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 14:06 — with Render Active
Embed the support session ID directly in system list and detail
endpoints to avoid N+1 API calls when checking session status per system.
@edospadoni edospadoni force-pushed the feature/support-service branch from aa25c3d to 825f86b Compare March 11, 2026 14:18
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 14:18 — with Render Active
Show a clickable headset icon next to system name when an active support
session exists. The popover displays session status, dates, and connected
operators with per-node terminal badges. Backend now tracks terminal
disconnect times via access log lifecycle (insert returns ID, disconnect
updates disconnected_at).
@edospadoni edospadoni deployed to feature/support-service - my-backend-qa PR #47 March 11, 2026 15:25 — with Render Active
@edospadoni edospadoni deployed to feature/support-service - my-frontend-qa PR #47 March 11, 2026 15:25 — with Render Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant