Skip to content

ExportWorker leaves ProjectFile stuck as in_progress with nil path on failure #4454

@stuartc

Description

@stuartc

Describe the bug

When a history export fails mid-execution (OOM, node restart, storage upload failure, or any exception during process_export), the ExportWorker leaves the ProjectFile record permanently stuck with status: :in_progress and path: nil.

The worker sets status: :in_progress at the start of perform/1 (line 59) but only writes status: :completed and path on success (line 64-67). The error branch (line 80-82) logs the error and returns {:error, reason} to Oban but never updates the ProjectFile record. With max_attempts: 1, there is no retry — the record is orphaned.

These orphaned records then cause a secondary bug: the data retention cron job (remove_expired_files_for/1 in lib/lightning/projects.ex:1013) queries all expired ProjectFile records — including orphaned ones with nil paths — and passes the nil path to Lightning.Storage.delete(nil), which crashes with URI.encode(nil). This runs every 2 hours and has generated ~56,000 Sentry events since May 2025 (LIGHTNING-Z7, LIGHTNING-ZX).

Version number

Current main — the bug has been present since the export feature was introduced.

I have reproduced this locally on main:

  • Yes
  • No

(Confirmed via database queries on staging and production — see below.)

To Reproduce

  1. Initiate a history export on any project
  2. Kill the node or cause the ExportWorker to fail during process_export or store_project_file
  3. Check the project_files table — the record will be status: in_progress, path: NULL
  4. Wait for the data retention cron to run on a project with history_retention_period set
  5. Observe URI.encode(nil) crash in Sentry

Evidence from production and staging

Production (1 orphaned record):

id: 447c68... | status: in_progress | path: NULL | type: export
inserted_at: 2025-11-03

Staging (7 orphaned records):

5 records from 2024-08-21
1 record from 2024-08-23
1 record from 2025-11-03
All status: in_progress, path: NULL, type: export

Expected behavior

  1. When ExportWorker.perform/1 fails, the ProjectFile record should be updated to status: :failed
  2. remove_expired_files_for/1 should skip records with nil/empty paths instead of passing them to Storage.delete
  3. Orphaned in_progress records from before the fix should be cleaned up via a migration or manual SQL

Additional context

  • ExportWorker uses max_attempts: 1 — no Oban retries
  • ProjectFile.new/1 does not include path in validate_required — this is by design since the path is set after upload
  • The data retention cron runs every 2 hours (17 */2 * * *) and hits these nil-path records on every cycle
  • Sentry issues: LIGHTNING-Z7 (28,975 events), LIGHTNING-ZX (27,407 events)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugNewly identified bug

    Type

    No type

    Projects

    Status

    Tech Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions