Skip to content

Opt-in graceful corruption handling in StraxFormatter with deadtime markers#105

Open
cfuselli wants to merge 3 commits into
masterfrom
feature/graceful-corruption-deadtime
Open

Opt-in graceful corruption handling in StraxFormatter with deadtime markers#105
cfuselli wants to merge 3 commits into
masterfrom
feature/graceful-corruption-deadtime

Conversation

@cfuselli
Copy link
Copy Markdown
Member

Graceful corruption handling in StraxFormatter + optional deadtime markers

Why

We are seeing DAQ failures where malformed/incomplete payloads propagate into the formatter path and can crash readout.
This PR adds an opt-in path to keep readout alive when corruption is detected, while explicitly marking affected periods via artificial deadtime.

This is intended as a mitigation for commissioning/debugging while CAEN-side root cause analysis continues.

Context from history

GenerateArtificialDeadtime(...) was intentionally disabled on board-fail events in commit a6a5387 ("Disable artificial deadtime until it gets fixed").
This PR re-introduces deadtime insertion in a guarded, configurable way and adds stricter parser guards.

What changes

1) New runtime flags (default-safe)

In StraxFormatter:

  • graceful_corruption_handling (default 0)
    • 0: preserve current fail-fast behavior
    • 1: catch corruption/parsing exceptions per datapacket, log, continue
  • inject_deadtime_on_corrupt (default 1)
    • if enabled, insert artificial deadtime marker when corruption/fail is handled

2) Corruption guard checks

Added explicit validation before potentially unsafe operations:

  • event word-count must be sane (>= event_header_words)
  • event word-count must not exceed remaining buffer
  • channel returned size must be sane before remove_prefix
  • channel header size sanity checks

3) Graceful handler

New helper:

  • HandleCorruptPacket(...)
    • increments corruption counter
    • logs board + reason
    • optionally inserts artificial deadtime marker

4) Processing loop behavior

In Process():

  • wraps ProcessDatapacket(...) in try/catch
  • if graceful mode is disabled: keeps fail-fast
  • if graceful mode is enabled: logs and continues

5) Board-fail path

On event fail bit, deadtime insertion is re-enabled behind config flag inject_deadtime_on_corrupt.

Files changed

  • StraxFormatter.cc
  • StraxFormatter.hh

Behavior / compatibility

  • Default behavior remains unchanged (graceful_corruption_handling=0).
  • No required straxen code changes: channel 799 and veto propagation already exist.

Risks / caveats

  • If corruption frequency is high, this can mask severity by keeping process alive; logs/counters must be monitored.
  • Deadtime timestamping currently uses packet header clock context; it is approximate marker semantics, not exact lost-bytes reconstruction.
  • This does not solve underlying CAEN/driver/transport root cause.

Validation performed

  • Code-level validation + history inspection.
  • Full build/test not possible in this environment due to missing dependencies (libmongocxx, local toolchain mismatch).

Suggested rollout

  1. Merge as disabled by default.
  2. Enable only on selected readers/runs:
    • graceful_corruption_handling = 1
    • inject_deadtime_on_corrupt = 1
  3. Monitor:
    • corruption log rate
    • fail counters
    • resulting straxen_deadtime_veto intervals in downstream data quality
  4. Reassess before enabling globally.

Follow-ups (recommended)

  • Add explicit status metrics for corruption/deadtime counts to DAQ monitoring.
  • Add a cap/rate-limit on deadtime insertion logs to avoid log flooding.
  • Add integration test with intentionally malformed packet fixtures.

@cfuselli cfuselli force-pushed the feature/graceful-corruption-deadtime branch from 2356d65 to 8022743 Compare May 13, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant