Skip to content
This repository was archived by the owner on May 20, 2026. It is now read-only.

feat: LXMF chat backend, OS-lifecycle identity, nix2container builds#6

Merged
cwilson613 merged 14 commits into
mainfrom
feature/lxmf-chat-backend
Feb 16, 2026
Merged

feat: LXMF chat backend, OS-lifecycle identity, nix2container builds#6
cwilson613 merged 14 commits into
mainfrom
feature/lxmf-chat-backend

Conversation

@cwilson613
Copy link
Copy Markdown
Contributor

Summary

  • LXMF chat backend (Phase 1): ConversationService, read receipts protocol, cross-node messaging with identity resolution and NodeStore persistence
  • OS-lifecycle identity resolution: styrened checks /etc/styrene/identity (NixOS activation) before ~/.styrene/operator.key, with _resolve_identity_path() as single source of truth
  • nix2container OCI builds: Replace Dockerfile pipeline with Nix flake-based container builds
  • Bare-metal test harness: Fleet-wide SSH-based test infrastructure with convergence, cross-arch, resilience, and deployment tests
  • IPC hardening: Comprehensive input validation, shutdown safety, adversarial review fixes
  • Terminal service: Client/service overhaul with integration tests

Key changes

  • Add ConversationService with message persistence, auto-reply cooldown, and read receipt protocol
  • Add _resolve_identity_path() resolving config override → /etc/styrene/identity~/.styrene/operator.key
  • Remove resolve_operator_identity_path() from config.py (was double-resolving)
  • Fix get_operator_identity() fallback to return None instead of leaking raw private key bytes
  • Fix CLI identity commands to display actual active identity path
  • Add nix/oci.nix for nix2container-based image builds
  • Add bare-metal test harness with SSH primitives and device registry
  • Bump version to 0.4.0

Test plan

  • 683 unit + service tests pass (2 skipped)
  • 41 identity detection tests including system path resolution, fallback behavior, and lifecycle wiring
  • k8s integration tests (require cluster)
  • Bare-metal tests (require physical devices)

🤖 Generated with Claude Code

cwilson613 and others added 14 commits February 2, 2026 10:49
Implements core chat functionality for styrene-tui integration:

New Components:
- ConversationService: Manages conversations, message history, unread tracking
- IPC message types: QUERY_CONVERSATIONS, QUERY_MESSAGES, CMD_SEND_CHAT,
  CMD_MARK_READ, CMD_DELETE_CONVERSATION, CMD_DELETE_MESSAGE
- IPC handlers for all conversation operations

Features:
- Conversation listing ordered by recency with unread counts
- Message history with pagination (before_timestamp, limit)
- Unread count tracking per-conversation (in-memory cache + DB)
- Delivery status tracking (pending -> sent -> delivered/failed)
- LXMF delivery callbacks for real-time status updates
- Thread-safe operations with proper locking

Database Changes:
- Added composite indexes for conversation queries
- ix_messages_source_dest for conversation lookup
- ix_messages_dest_status for unread counting

Integration:
- ConversationService wired into daemon lifecycle
- Incoming chat messages automatically persisted
- Service exported from styrened.services module

Tests:
- 40 unit tests covering all ConversationService functionality
- All 511 tests pass

Part of #2 - LXMF Chat Backend feature work
Key fixes for cross-node LXMF messaging:

1. daemon.py: Pass node_store to start_discovery() so discovered devices
   are persisted to NodeStore with their identity_hash mappings. This is
   critical for identity resolution when sending messages.

2. node_store.py: Add prefix matching for truncated destination hashes.
   CLI users often copy partial hashes (16 chars), but NodeStore stored
   full 32-char hashes. Now supports both exact and prefix matching.

3. lxmf_service.py: Fix message.hash access - LXMF computes hash during
   handle_outbound(), not before. Move hash access after send.

4. lxmf_service.py: Add _load_identity_from_storage() fallback for cases
   where NodeStore has identity_hash but RNS cache is empty.

The identity resolution flow now works:
- Strategy 1: Direct RNS.Identity.recall(destination_hash)
- Strategy 2: NodeStore lookup by operator destination (with prefix match)
- Strategy 3: NodeStore lookup by LXMF destination (with prefix match)
- Then: RNS.Identity.recall(identity_hash, from_identity_hash=True)

Tested bidirectional messaging between styrene-node and t100ta.
- Add destination_hash to SendMessageResult in LXMFService
- Add update_destination_hash method to ConversationService
- Update IPC handler to normalize peer_hash after LXMF resolution
- Add 14 unit tests for chat handler validation and null checks
- Add 2 unit tests for update_destination_hash

Fixes issue where truncated (16-char) hashes passed to IPC chat
commands were stored as-is, causing list_conversations to fail
to match messages. Now the full 32-char LXMF destination hash
is always stored after successful identity resolution.
LXMRouter.process_deferred_stamps() throws TypeError when Transport.identity
is None. Triggered by propagated delivery - LXMF spawns a background thread
that calls get_outbound_propagation_cost() without checking if Transport is
ready. This is an upstream bug in LXMF that needs a guard in
get_outbound_propagation_cost(). We cannot catch this exception as it's in
LXMF's internal thread. Daemon continues running but errors pollute logs.
1. Fix critical deadlock in delivery callbacks (handlers.py)
   - Callbacks were acquiring conversation_service._lock then calling
     methods that also acquire the same lock (threading.Lock is not reentrant)
   - Fixed by using a separate tracking_lock for callback state management
   - Service method calls now happen outside the lock

2. Fix MockLXMFService missing destination_hash and callback params
   - Added on_delivery and on_failed callback parameters to match real service
   - Added destination_hash to SendMessageResult return value
   - Mock now simulates full hash resolution for truncated inputs

3. Fix prefix matching ambiguity in node_store
   - Added MIN_PREFIX_LENGTH (8 chars) requirement for prefix lookups
   - Added _is_valid_hash_prefix() validation function
   - Changed prefix queries to detect and reject ambiguous matches
   - Returns None with warning log if multiple nodes match prefix

4. Added 10 new unit tests for hash prefix validation
- Register CMD_RETRY_MESSAGE handler in IPC server (was completely broken)
- Parse reply_to_hash from IPC CMD_SEND_CHAT payload (threading was broken)
- Parse persist_messages from chat config section
- Add cleanup_stale_deliveries() to prevent memory leak from orphaned trackers
- Add periodic cleanup call in daemon run loop
- Fix TOCTOU race in prepare_retry() with SELECT FOR UPDATE
- Add tests for stale delivery cleanup and persist_messages config
- Add peer hash format validation (16-32 hex chars) to all handlers
  that accept peer_hash parameter: handle_cmd_send_chat,
  handle_query_messages, handle_cmd_mark_read, handle_cmd_delete_conversation
- Add content size limits: MAX_CHAT_CONTENT_LENGTH (64KB),
  MAX_TITLE_LENGTH (256 chars)
- Add shutdown safety check to delivery callbacks to avoid accessing
  conversation service after it's been shut down
- Add FTS5 query syntax error handling in search_messages to return
  empty results instead of crashing on malformed search queries
- Add limit parameter bounds validation (1 to MAX_MESSAGE_LIMIT=1000)
  for QUERY_MESSAGES and QUERY_SEARCH_MESSAGES handlers
- Add message_id type validation (must be positive integer) for
  CMD_DELETE_MESSAGE and CMD_RETRY_MESSAGE handlers
- Add reply_to_hash format validation (64 hex chars) for CMD_SEND_CHAT
- Add delivery_method validation (must be auto/direct/propagated)
- Update tests to match new error messages
- Expand FTS5 error patterns to catch more syntax error variants
  (parse error, unexpected token, no such column, malformed match)
- Fix stale _message_callback reference in LXMFService shutdown
  (was singular, should be _message_callbacks.clear())
- Add negative uptime guard in auto_reply _format_uptime to handle
  edge cases like NTP sync causing future start_time
- Add before_timestamp validation in conversation_service to reject
  negative, NaN, and Inf values gracefully
Release 0.4.0 - LXMF Chat Backend & Security Hardening

Added:
- ConversationService for full LXMF chat with SQLite persistence
- Sideband/NomadNet/MeshChat interoperability (plain text normalization)
- Terminal service security: async identity verification, session hijacking
  prevention, rate limiting, command/signal validation, idle timeout
- RPC security: authorization, rate limiting, replay protection
- IPC handler input validation and shutdown safety
- 658 unit tests (up from ~400)

Fixed:
- LocalInterface reconnection duplicate destination spam
- Cross-node messaging identity resolution
- Config loading edge cases
- NodeStore hash validation
Adversarial assessment identified 6+ interface bugs, hardcoded device
lists, and zero observability. This expands testing from 2 hardcoded
devices to the full 4-machine fleet via devices.yaml, adds log capture
and metrics collection, and creates automated deployment tooling.

- Fix harness interface bugs (registry property, get_identity return
  type, exec_command alias, optional identity_hash, return_code naming)
- Parametrize all tests over ALL_DEVICES/DEVICES_WITH_IDENTITY from YAML
- Add --device CLI filter for targeted test runs
- Add log_capture.py: autouse fixture capturing journalctl on failure
- Add metrics.py: background SSH sampling with leak detection regression
- Add test_convergence.py, test_resilience.py, test_cross_arch.py
- Add scripts/bare-metal-deploy.sh for automated fleet deployment
- Update justfile bare-metal recipes to read from devices.yaml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the broken Docker-based container build with nix2container for
reproducible, layer-optimized OCI images. Add Argo WorkflowTemplates
for self-hosted CI/CD on brutus k3s cluster.

- Add nix/deps.nix (RNS 1.1.3 + LXMF 0.9.4), nix/package.nix, nix/oci.nix
- Rewrite flake.nix with nix2container input, gate OCI behind isLinux
- Move entrypoint.sh to container/, apply ShellCheck fixes
- Update all GitHub Actions workflows to use nix build
- Update justfile with nix build + copyToPodman recipes
- Add .argo/workflows/ (edge, PR, release, nightly, cron)
- Remove Dockerfile, docker-bake.hcl, .dockerignore
- Update CONTAINERS.md, README, deployment and release docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…port

Move Reticulum operator identity into the OS provisioning lifecycle.
styrened now checks /etc/styrene/identity (system-level, generated at
NixOS activation) before falling back to ~/.styrene/operator.key.

- Add _resolve_identity_path() as single source of truth for identity
  file resolution: config override -> system -> user -> LXMF detection
- Remove resolve_operator_identity_path() from config.py (was double-
  resolving with reticulum.py)
- Wire config.reticulum.operator_identity_path through lifecycle to
  ensure_operator_identity()
- Fix CLI identity commands to show actual active identity path instead
  of hardcoded OPERATOR_IDENTITY_PATH
- Fix get_operator_identity() fallback to return None instead of leaking
  raw private key bytes as a fake identity hash
- Use readlink() in unshare/sharing_status for broken symlink support
- Add TestSystemIdentityPath, TestGetOperatorIdentity, and
  TestLifecycleIdentityWiring test classes (41 identity tests total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CI/CD pipeline moving to Argo Workflows definitions in .argo/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cwilson613 cwilson613 merged commit c98b8cb into main Feb 16, 2026
@cwilson613 cwilson613 deleted the feature/lxmf-chat-backend branch February 16, 2026 02:32
cwilson613 added a commit that referenced this pull request Mar 12, 2026
feat: LXMF chat backend, OS-lifecycle identity, nix2container builds
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant