Problem
There is no native multi-file ingestion command. Client0 wrote a 255-line bash script to
process 20 PDFs sequentially with retry, Milvus restart, and skip/resume logic. This
complexity belongs inside weave-cli.
Proposed Solution
New subcommand `weave docs create-batch`:
```bash
weave docs create-batch AuctionListings "data/pdfs/*-catalogue.pdf"
--milvus-local
--embedding text-embedding-3-small
--skip-existing
--timeout 30m
--delay 10s
--max-retries 3
--log-file logs/ingestion.log
--checkpoint-file .ingestion-checkpoint.json
```
Key Features
- Glob expansion — already in v0.9.27
- Configurable delay between files (`--delay 10s`) — prevents VDB memory buildup
- Retry with backoff (`--max-retries 3`, `--retry-delay 30s`) — auto-retry failed files
- Checkpoint/resume (`--checkpoint-file`) — save state after each file, resume on crash
- Structured log file (`--log-file`) — timestamped append-mode log
- Auto-create image collection — if `--image-collection` doesn't exist, create it
- Batch summary at completion:
```
✅ Batch complete: 9/9 succeeded, 0 failed, 2 skipped
Duration: 1h 23m | Log: logs/ingestion.log
```
Checkpoint File Format
```json
{
"collection": "AuctionListings",
"started": "2026-02-17T10:00:00Z",
"completed": [
{"file": "2017-catalogue.pdf", "chunks": 28, "at": "2026-02-17T10:12:33Z"}
],
"failed": [],
"skipped": []
}
```
Client0 Impact
Replaces the entire 255-line bash wrapper. Two `weave docs create-batch` calls replace the
entire pipeline script.
Priority
P1 — v0.9.29 target (~10 hours)
Problem
There is no native multi-file ingestion command. Client0 wrote a 255-line bash script to
process 20 PDFs sequentially with retry, Milvus restart, and skip/resume logic. This
complexity belongs inside weave-cli.
Proposed Solution
New subcommand `weave docs create-batch`:
```bash
weave docs create-batch AuctionListings "data/pdfs/*-catalogue.pdf"
--milvus-local
--embedding text-embedding-3-small
--skip-existing
--timeout 30m
--delay 10s
--max-retries 3
--log-file logs/ingestion.log
--checkpoint-file .ingestion-checkpoint.json
```
Key Features
```
✅ Batch complete: 9/9 succeeded, 0 failed, 2 skipped
Duration: 1h 23m | Log: logs/ingestion.log
```
Checkpoint File Format
```json
{
"collection": "AuctionListings",
"started": "2026-02-17T10:00:00Z",
"completed": [
{"file": "2017-catalogue.pdf", "chunks": 28, "at": "2026-02-17T10:12:33Z"}
],
"failed": [],
"skipped": []
}
```
Client0 Impact
Replaces the entire 255-line bash wrapper. Two `weave docs create-batch` calls replace the
entire pipeline script.
Priority
P1 — v0.9.29 target (~10 hours)