Multi-threaded / Concurrent Data Pipeline

Refactor the Prefect pipeline for parallel document processing.
- Replace the sequential `for page in pages` in pipeline.py with asyncio.gather() or Prefect's task.submit() (thread/process pool)
- Each page processes extract → generate Cypher → populate in parallel (configurable concurrency limit)
- Add idempotency: track processed docs (hash in DB)
- Integration test: mock 10 pages, assert concurrent write to Neo4j (no schema reflection race)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threaded / Concurrent Data Pipeline #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-threaded / Concurrent Data Pipeline #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions