Releases: matthane/libpgs
v0.6.0
Highlights
Manifest header for .sup streams (opt-in)
libpgs stream now accepts a --with-header flag that, for .sup inputs, prepends a single NDJSON record carrying pre-scanned display-set counts:
{"type":"header","total_display_sets":1823,"total_content_display_sets":1456,"total_clear_display_sets":367}This lets downstream consumers show a progress denominator (e.g. 347/1823) from the first record instead of waiting until the stream finishes. The pre-scan walks only 13-byte segment headers (plus tiny PCS payloads) and seeks over other payloads — typically under a second even on multi-GB files. The flag is opt-in so default stream invocations pay no upfront latency.
Containers (MKV, M2TS) ignore the flag: counting there would require a full demux, and MKV already surfaces per-track display_set_count via the existing tracks line when Tags are present.
Faster NDJSON streaming for dense subtitles
Per-line emission in libpgs stream was reworked to assemble each NDJSON line into a reused scratch buffer and issue a single chunked write_all + flush per line. This keeps pipe consumers fast on Windows (where a single large pipe write can block the producer) and collapses many small write! calls into one allocation per line.
New Extractor::with_history(bool) opt-out
Callers that exhaust the iterator in a single pass can now disable the internal history catalog to avoid cloning every yielded display set:
let extractor = Extractor::open(path)?.with_history(false);The libpgs stream CLI uses this internally since it never re-reads history.
v0.5.0 — PGS encoding and round-trip
Highlights
PGS encoding and round-trip support
libpgs is no longer extract-only. This release adds a complete encoding path so you can read, modify, and write PGS data:
DisplaySetBuilder— chainable builder for constructing display sets from structured payloads. Handles RLE encoding and ODS fragmentation automatically.- Payload serialization —
PcsData,WdsData,PdsData, andOdsDataall gainto_bytes()methods.PgsSegmentgainsfrom_pcs/wds/pds/odsfactories and in-placeset_*_payloadmutators. - RLE encoder —
encode_rle(pixels, width, height)complements the existing decoder, enabling full bitmap round-trips. ObjectBitmaptype — represents a complete object bitmap (id, dimensions, pixel buffer) for encoding into ODS segments.
encode CLI command
The new libpgs encode -o <output.sup> subcommand reads NDJSON from stdin (in the same format that libpgs stream produces) and writes a .sup file. This closes the round-trip loop for external scripts in any language:
libpgs stream input.mkv | your-script.py | libpgs encode -o output.supMulti-track input is automatically split into <stem>_track<id>.sup files. See docs/NDJSON.md for the full protocol reference.
Bug fixes
- Cue-path block timestamps were off by the block's cluster-relative offset. When extracting PGS from MKV files via the cue fast path (the default for any MKV with a Cues index),
read_block_at_positionwas computingcue_time + block.relative_timestamp, but per the Matroska specCueTimeis already the absolute timestamp of the block referenced byCueRelativePosition. This caused every extracted display set to have an incorrect PTS, shifted by an amount equal to the block's cluster-relative offset (ranging from 0 to several seconds per block)..supfiles extracted with v0.4.0 and earlier from cue-indexed MKVs are affected and should be re-extracted with v0.5.0 — output is now byte-for-byte identical tomkvextract.
Documentation
STREAMING.mdrenamed todocs/NDJSON.mdand expanded with encode protocol details.- README updated with encoding and round-trip examples.
v0.4.0 — Time-range filtering
New features
- Time-range filtering with
--startand--endtimestamps forextractandstreamcommands. AcceptsHH:MM:SS.ms,MM:SS.ms,SS.ms, or plain seconds. - Per-format seeking skips directly to the target position — no scanning of data before the start point:
- MKV with Cues — exact seeking via cue point filtering
- MKV sequential / M2TS — binary search refinement with timestamp probing (up to 20 iterations) to converge on the correct byte offset despite variable bitrate
- SUP — bitrate estimation from first/last PTS in segment headers
- Library API:
Extractor::with_time_range(start_ms, end_ms)for programmatic time-range filtering (chainable builder pattern)
If no display sets fall within the requested range, zero results are reported with no error.
Usage
libpgs extract movie.mkv -o out.sup --start 0:05:00 # From 5 minutes to end
libpgs stream movie.mkv --start 0:05:00 --end 0:10:00 # 5-minute window only
libpgs stream movie.m2ts --start 1:30:00 --end 1:35:00 -t 4768 # M2TS with track filterlet extractor = Extractor::open("movie.mkv")?
.with_time_range(Some(300_000.0), Some(600_000.0)); // 5:00 to 10:00v0.3.0
Features
- PGS segment payload parsing (PCS, WDS, PDS, ODS)
- Redesigned NDJSON streaming output with semantic grouping
- PGS RLE bitmap decoder with decoded bitmap output and automatic fragment reassembly
- Language code normalization to BCP 47 / ISO 639-1
- CLPI SequenceInfo parser — M2TS timestamps now corrected by presentation_start_time
- Streaming reference documentation
v0.2.0
What's Changed
Performance
- Replace ClusterScan with Sequential fallback — When no Cues index is available, the extractor now does a single-pass linear scan instead of the two-pass ClusterScan approach. Benchmarks showed Sequential is 1.3–2.2x faster on every storage type tested, as seek latency dominated ClusterScan despite it reading ~400x less data.
Features
.supfile input support — Read raw PGS streams directly viaExtractor::open().- MKV track metadata —
PgsTrackInfonow includesname,flag_default,flag_forced,display_set_count, andhas_cuesfields. - NDJSON track header —
streamcommand now emits track metadata (name, flags, display set count) in the tracks header line. - Comma-separated track IDs —
-t 3,5,8syntax for filtering multiple tracks.
Fixes
- Fix wrong EBML ID for TagName element.
- Track per-cue track number and filter cues by active tracks.
Full Changelog: v0.1.0...v0.2.0
v0.1.0
libpgs v0.1.0
Initial release of libpgs — a Rust library + CLI for extracting PGS subtitles from MKV and M2TS/TS containers with minimal I/O.
Features
- Streaming extraction —
Extractoriterator yields display sets one at a time, enabling early termination and incremental processing - MKV support — three-tier extraction (Cues fast path, cluster probe, sequential scan) with parallel optimization for batch collection
- M2TS/TS support — bulk PID scanning with 2MB I/O buffer and PES reassembly
- BDMV language fallback — reads CLPI files for PID → language mappings when M2TS is inside a BDMV structure
- NDJSON streaming —
libpgs streamcommand outputs structured JSON over stdout for cross-language subprocess integration - CLI commands —
tracks,extract,stream,bench
Downloads
Pre-built binaries for Windows (x86_64), Linux (x86_64), and macOS (x86_64 + aarch64) will be attached shortly after the release workflow completes.