Skip to content

feat(data-pipeline): add concurrent idempotent processing flow#25

Open
Irahan2 wants to merge 1 commit intomainfrom
feat/pipeline-concurrency
Open

feat(data-pipeline): add concurrent idempotent processing flow#25
Irahan2 wants to merge 1 commit intomainfrom
feat/pipeline-concurrency

Conversation

@Irahan2
Copy link
Copy Markdown
Collaborator

@Irahan2 Irahan2 commented Apr 12, 2026

  • Replace sequential page loop with bounded parallel batches in Prefect flow

  • Add hash-based claim/processed/failed tracking in Neo4j with stale claim reclaim

  • Add race-safe schema reflection strategy (before/after batches) and run summary counters

  • Add integration-style tests for 10-page processing, duplicate skip, and failure isolation

  • Document DATA_PIPELINE_MAX_CONCURRENCY and DATA_PIPELINE_CLAIM_STALE_MINUTES

- Replace sequential page loop with bounded parallel batches in Prefect flow

- Add hash-based claim/processed/failed tracking in Neo4j with stale claim reclaim

- Add race-safe schema reflection strategy (before/after batches) and run summary counters

- Add integration-style tests for 10-page processing, duplicate skip, and failure isolation

- Document DATA_PIPELINE_MAX_CONCURRENCY and DATA_PIPELINE_CLAIM_STALE_MINUTES
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant