Skip to content

Releases: matthane/libpgs

v0.6.0

12 Apr 21:56
v0.6.0
2f28c90

Choose a tag to compare

Highlights

Manifest header for .sup streams (opt-in)

libpgs stream now accepts a --with-header flag that, for .sup inputs, prepends a single NDJSON record carrying pre-scanned display-set counts:

{"type":"header","total_display_sets":1823,"total_content_display_sets":1456,"total_clear_display_sets":367}

This lets downstream consumers show a progress denominator (e.g. 347/1823) from the first record instead of waiting until the stream finishes. The pre-scan walks only 13-byte segment headers (plus tiny PCS payloads) and seeks over other payloads — typically under a second even on multi-GB files. The flag is opt-in so default stream invocations pay no upfront latency.

Containers (MKV, M2TS) ignore the flag: counting there would require a full demux, and MKV already surfaces per-track display_set_count via the existing tracks line when Tags are present.

Faster NDJSON streaming for dense subtitles

Per-line emission in libpgs stream was reworked to assemble each NDJSON line into a reused scratch buffer and issue a single chunked write_all + flush per line. This keeps pipe consumers fast on Windows (where a single large pipe write can block the producer) and collapses many small write! calls into one allocation per line.

New Extractor::with_history(bool) opt-out

Callers that exhaust the iterator in a single pass can now disable the internal history catalog to avoid cloning every yielded display set:

let extractor = Extractor::open(path)?.with_history(false);

The libpgs stream CLI uses this internally since it never re-reads history.

v0.5.0 — PGS encoding and round-trip

05 Apr 01:23

Choose a tag to compare

Highlights

PGS encoding and round-trip support

libpgs is no longer extract-only. This release adds a complete encoding path so you can read, modify, and write PGS data:

  • DisplaySetBuilder — chainable builder for constructing display sets from structured payloads. Handles RLE encoding and ODS fragmentation automatically.
  • Payload serializationPcsData, WdsData, PdsData, and OdsData all gain to_bytes() methods. PgsSegment gains from_pcs/wds/pds/ods factories and in-place set_*_payload mutators.
  • RLE encoderencode_rle(pixels, width, height) complements the existing decoder, enabling full bitmap round-trips.
  • ObjectBitmap type — represents a complete object bitmap (id, dimensions, pixel buffer) for encoding into ODS segments.

encode CLI command

The new libpgs encode -o <output.sup> subcommand reads NDJSON from stdin (in the same format that libpgs stream produces) and writes a .sup file. This closes the round-trip loop for external scripts in any language:

libpgs stream input.mkv | your-script.py | libpgs encode -o output.sup

Multi-track input is automatically split into <stem>_track<id>.sup files. See docs/NDJSON.md for the full protocol reference.

Bug fixes

  • Cue-path block timestamps were off by the block's cluster-relative offset. When extracting PGS from MKV files via the cue fast path (the default for any MKV with a Cues index), read_block_at_position was computing cue_time + block.relative_timestamp, but per the Matroska spec CueTime is already the absolute timestamp of the block referenced by CueRelativePosition. This caused every extracted display set to have an incorrect PTS, shifted by an amount equal to the block's cluster-relative offset (ranging from 0 to several seconds per block). .sup files extracted with v0.4.0 and earlier from cue-indexed MKVs are affected and should be re-extracted with v0.5.0 — output is now byte-for-byte identical to mkvextract.

Documentation

  • STREAMING.md renamed to docs/NDJSON.md and expanded with encode protocol details.
  • README updated with encoding and round-trip examples.

v0.4.0 — Time-range filtering

28 Mar 22:48

Choose a tag to compare

New features

  • Time-range filtering with --start and --end timestamps for extract and stream commands. Accepts HH:MM:SS.ms, MM:SS.ms, SS.ms, or plain seconds.
  • Per-format seeking skips directly to the target position — no scanning of data before the start point:
    • MKV with Cues — exact seeking via cue point filtering
    • MKV sequential / M2TS — binary search refinement with timestamp probing (up to 20 iterations) to converge on the correct byte offset despite variable bitrate
    • SUP — bitrate estimation from first/last PTS in segment headers
  • Library API: Extractor::with_time_range(start_ms, end_ms) for programmatic time-range filtering (chainable builder pattern)

If no display sets fall within the requested range, zero results are reported with no error.

Usage

libpgs extract movie.mkv -o out.sup --start 0:05:00            # From 5 minutes to end
libpgs stream movie.mkv --start 0:05:00 --end 0:10:00          # 5-minute window only
libpgs stream movie.m2ts --start 1:30:00 --end 1:35:00 -t 4768 # M2TS with track filter
let extractor = Extractor::open("movie.mkv")?
    .with_time_range(Some(300_000.0), Some(600_000.0)); // 5:00 to 10:00

v0.3.0

28 Mar 02:33

Choose a tag to compare

Features

  • PGS segment payload parsing (PCS, WDS, PDS, ODS)
  • Redesigned NDJSON streaming output with semantic grouping
  • PGS RLE bitmap decoder with decoded bitmap output and automatic fragment reassembly
  • Language code normalization to BCP 47 / ISO 639-1
  • CLPI SequenceInfo parser — M2TS timestamps now corrected by presentation_start_time
  • Streaming reference documentation

v0.2.0

21 Mar 14:50

Choose a tag to compare

What's Changed

Performance

  • Replace ClusterScan with Sequential fallback — When no Cues index is available, the extractor now does a single-pass linear scan instead of the two-pass ClusterScan approach. Benchmarks showed Sequential is 1.3–2.2x faster on every storage type tested, as seek latency dominated ClusterScan despite it reading ~400x less data.

Features

  • .sup file input support — Read raw PGS streams directly via Extractor::open().
  • MKV track metadataPgsTrackInfo now includes name, flag_default, flag_forced, display_set_count, and has_cues fields.
  • NDJSON track headerstream command now emits track metadata (name, flags, display set count) in the tracks header line.
  • Comma-separated track IDs-t 3,5,8 syntax for filtering multiple tracks.

Fixes

  • Fix wrong EBML ID for TagName element.
  • Track per-cue track number and filter cues by active tracks.

Full Changelog: v0.1.0...v0.2.0

v0.1.0

20 Mar 03:42

Choose a tag to compare

libpgs v0.1.0

Initial release of libpgs — a Rust library + CLI for extracting PGS subtitles from MKV and M2TS/TS containers with minimal I/O.

Features

  • Streaming extractionExtractor iterator yields display sets one at a time, enabling early termination and incremental processing
  • MKV support — three-tier extraction (Cues fast path, cluster probe, sequential scan) with parallel optimization for batch collection
  • M2TS/TS support — bulk PID scanning with 2MB I/O buffer and PES reassembly
  • BDMV language fallback — reads CLPI files for PID → language mappings when M2TS is inside a BDMV structure
  • NDJSON streaminglibpgs stream command outputs structured JSON over stdout for cross-language subprocess integration
  • CLI commandstracks, extract, stream, bench

Downloads

Pre-built binaries for Windows (x86_64), Linux (x86_64), and macOS (x86_64 + aarch64) will be attached shortly after the release workflow completes.