Skip to content

Add --format jsonl output mode to history command#6

Open
Chen17-sq wants to merge 1 commit intohuohuoer:mainfrom
Chen17-sq:feat/jsonl-output-history
Open

Add --format jsonl output mode to history command#6
Chen17-sq wants to merge 1 commit intohuohuoer:mainfrom
Chen17-sq:feat/jsonl-output-history

Conversation

@Chen17-sq
Copy link
Copy Markdown

Summary

Adds a --format jsonl mode to wechat-cli history that emits one structured JSON record per line (NDJSON) instead of the existing array of pre-formatted display strings. Use case: feed raw fields into downstream LLM / agent pipelines without re-parsing display text.

Example:
```bash
wechat-cli history "Alice" --format jsonl
```
```json
{"local_id": 12345, "create_time": 1714048200, "msg_type": "text", "local_type": 1, "is_self": false, "sender_id": "wxid_alice", "sender_display": "Alice", "is_group": false, "chat_username": "wxid_alice", "chat_display": "Alice", "text": "今晚发你 deck"}
```

Approach

  • New _build_history_record() parallel to existing _build_history_line() — same row decoding + content decompression + sender resolution, returns a dict instead of formatted string.
  • collect_chat_history() gets a new as_records=False kwarg. Default value preserves existing behavior; every existing caller is unchanged.
  • New output_jsonl() helper in output/formatter.py.
  • history.py adds jsonl to --format choices and routes accordingly. Failures go to stderr so stdout remains pure NDJSON for piping.

Why

json and text modes return display strings like `[2026-04-25 21:30] Alice: 今晚发你 deck` — great for humans, but downstream tooling has to regex-parse the rendered timestamp + sender + text apart again. NDJSON gives those fields directly. Useful for:

  • LLM commitment / fact extractors that need structured `is_self` / `sender_id`
  • Streaming pipelines (one JSON object per line, no need to buffer the whole array)
  • `| jq` / `| awk` style processing

Test plan

  • Unit tests cover record shape (1:1 self / 1:1 contact / group with sender / group self) — `tests/test_history_jsonl.py`
  • `output_jsonl()` emits one JSON per line with Unicode preserved
  • All required fields present (11 keys)
  • `python3 -m unittest tests.test_history_jsonl` — 7/7 passing
  • `wechat-cli history "" --format jsonl` against real local data (reviewer side)

Backward compat

  • `--format json` and `--format text` unchanged
  • `collect_chat_history()` default behavior unchanged (new param defaults to `False`)
  • No new dependencies
  • Tests use stdlib `unittest`, no pytest

Emits one structured JSON object per line (NDJSON), enabling streamed
consumption by downstream pipelines instead of buffering a single large
JSON array. Each record exposes parsed fields (local_id, create_time,
msg_type, is_self, sender_id, sender_display, chat_username,
chat_display, text) rather than the pre-formatted display string used
by the existing json/text modes.

- core/messages.py: add _build_history_record, parallel to
  _build_history_line; add as_records param to collect_chat_history so
  the existing builder remains the default and behavior is unchanged.
- output/formatter.py: add output_jsonl helper.
- commands/history.py: add jsonl to --format choices and route the
  output accordingly. Failures are routed to stderr to keep stdout pure
  NDJSON.
- tests/test_history_jsonl.py: cover record shape (1:1 self / 1:1
  other / group with sender / group self) and jsonl emission
  (one-object-per-line, valid JSON, unicode preserved).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant