Skip to content

accept_loop resilience: non-blocking handshakes and tunnel health monitoring #13

@rafabd1

Description

@rafabd1

Description

The accept_loop (responsible for receiving incoming peer connections) has two resilience problems:

  1. Blocking on stuck handshakes: handle_incoming runs inline in the accept loop. If the handshake hangs (see Peer connection hangs indefinitely: missing timeouts on handshake I/O over degraded I2P tunnels #11), no new incoming connections can be accepted until it completes or the stream eventually errors out. This can block the entire accept pipeline for minutes.

  2. No tunnel health awareness: The router status is set to "ready" as soon as the SAM session is created, but the underlying I2P tunnels may be completely non-functional. The user sees "connected" while the app cannot actually send or receive any data. As seen in the logs, tunnel tests can fail continuously for minutes while the status remains "ready".

Problem 1: accept_loop blocks on handle_incoming

Current code (session.rs, line ~192)

async fn accept_loop(...) {
    loop {
        match accept_once_raw(&session_id, &sam_addr).await {
            Ok((peer_dest, tunnel)) => {
                // This runs INLINE — blocks the loop until complete
                if let Err(e) = handle_incoming(&app, peer_dest, tunnel).await {
                    // ...
                }
            }
            // ...
        }
    }
}

If handle_incoming hangs on read_framed (waiting for handshake data that never arrives), the loop is stuck. No further STREAM ACCEPT calls happen, so all subsequent connection attempts from peers are queued/dropped at the SAM level.

What needs to change

Spawn handle_incoming as a separate task with a timeout:

Ok((peer_dest, tunnel)) => {
    let app_clone = app.clone();
    tauri::async_runtime::spawn(async move {
        let result = tokio::time::timeout(
            Duration::from_secs(60),
            handle_incoming(&app_clone, peer_dest, tunnel),
        ).await;

        match result {
            Ok(Ok(())) => {} // handshake succeeded
            Ok(Err(e)) => {
                log::warn!("incoming handshake failed: {}", e);
            }
            Err(_) => {
                log::warn!("incoming handshake timed out");
            }
        }
    });
}

This way:

  • The accept loop immediately goes back to waiting for the next connection
  • Each handshake runs independently with its own timeout
  • A stuck handshake doesn't block other peers from connecting

Limit concurrent handshakes (optional but recommended): use a semaphore to prevent resource exhaustion if many peers connect simultaneously:

let semaphore = Arc::new(Semaphore::new(3)); // max 3 concurrent handshakes
// ... in the loop:
let permit = semaphore.clone().acquire_owned().await;
tauri::async_runtime::spawn(async move {
    let _permit = permit; // held until handshake completes
    // ... handle_incoming with timeout
});

Problem 2: No tunnel health feedback

Current behavior

In do_connect_i2p (session.rs, line ~153):

*state.router_status.lock().await = "ready".to_string();
let _ = app.emit("router_status_changed", "ready");

This fires as soon as the SAM session is created and the first inbound tunnel is built. But:

  • The I2P router is still building/rebuilding tunnels
  • Tunnel tests may be failing continuously
  • The tunnels may be too degraded to carry any data

The user sees "ready" in the UI while the network is essentially unusable.

What needs to change

  1. Delay "ready" status until tunnels are verified: After creating the SAM session, perform a basic health check before declaring "ready". The emissary router events include tunnel build success/failure — use these to wait until at least one outbound+inbound tunnel pair is established and tested:

    A pragmatic approach: after the SAM session is created, send a small test payload through the tunnels (e.g., a SAM NAMING LOOKUP for a known destination) and wait for a response. If it times out, set status to "degraded" instead of "ready".

  2. Add a "degraded" router status: Extend RouterStatus to include a degraded state:

    In types.ts:

    export type RouterStatus = "idle" | "bootstrapping" | "connecting" | "ready" | "degraded" | "error";

    The frontend should show this as a yellow/warning indicator, e.g.:

    "Connected to I2P (tunnels degraded — connections may be slow)"

  3. Periodic connectivity check (optional): Run a lightweight background task that periodically verifies tunnel health. If tunnels degrade after initial connection, update the status accordingly. This could be as simple as:

    • Every 60 seconds, attempt a SAM NAMING LOOKUP ME
    • If it fails 3 times in a row, set status to "degraded"
    • If it succeeds again, set status back to "ready"

Frontend changes

The SessionSetup and Header components should handle the new "degraded" status:

  • Show a warning banner when degraded
  • The "Connect" button should still work but show a tooltip/note: "Network is degraded — connection attempts may take longer"

Relationship to other issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions