zctw is a Python implementation of the Context Tree Weighting (CTW) lossless compression algorithm. It provides both a Python library for programmatic compression/decompression and a CLI tool for encoding and decoding files.
-
In scope:
- Full CTW algorithm implementation (encode/decode)
- Pure Python implementation using only stdlib
- CLI tool with encode (
e) and decode (d) modes - Configurable parameters (tree depth, max nodes, file buffer size, etc.)
- Support for Zero-Redundancy and Krichevski-Trofimov estimators
- Arithmetic encoder/decoder implementation
- File header read/write for compressed files
-
Not in scope:
- GUI interface
- Streaming compression (files must fit in memory)
- Thread safety / parallel processing
from zctw import CTWCompressor
# Initialize compressor with settings
compressor = CTWCompressor(
treedepth: int = 6,
maxnrnodes: int = 4194304,
maxnrtries: int = 32,
maxfilebufsize: int = 4194304,
maxlogbeta: int = 1024,
strictpruning: bool = True,
rootweighting: bool = False,
use_zeroredundancy: bool = True,
)
# Encode bytes to compressed bytes
compressed = compressor.encode(data: bytes) -> bytes
# Decode compressed bytes to original bytes
decompressed = compressor.decode(compressed: bytes) -> byteszctw e <input_file> [<output_file>] [options]
zctw d <input_file> [<output_file>] [options]Options:
-dX: Set maximum tree depth (1-12)-tX: Set maximum number of tries in tree array (1-32)-nX: Set maximum number of nodes (size of tree array), supports K/M suffix (e.g., 4M)-fX: Set maximum file buffer size, supports K/M suffix-bX: Set maximum value of log beta-s: Disable strict tree pruning-r: Enable weighting at root nodes-k: Use Krichevski-Trofimov estimator instead of Zero-Redundancy-y: Force overwriting of existing files-lX: Enable logging to file X
Output file defaults:
- Encode:
<input_file>.zctw - Decode:
<input_file>.d(removes .zctw extension)
@dataclass
class CTWSettings:
treedepth: int # max depth of trees, excluding root (1-12)
maxnrnodes: int # max nodes in tree array (power of 2)
maxnrtries: int # max tries in hash table (1-32)
filebufsize: int # actual file buffer size (power of 2)
maxfilebufsize: int # max file buffer size (power of 2)
strictpruning: bool # use strict pruning method
maxlogbeta: int # max value of logbeta (power of 2)
rootweighting: bool # perform weighting at root node
use_zeroredundancy: bool # use zero-redundancy estimator- Empty input: Handle gracefully (return empty output)
- Single byte input: Encode with initial context
- Input smaller than tree depth: Encode all bytes with probability 0.5
- Tree array full: Freeze tree structure, no new nodes created
- Hash collision: Handle failed node searches up to maxnrtries
- File already exists: Prompt for overwrite or use -y flag
- Invalid settings: Validate and report errors before processing
- Corrupted compressed file: Detect and raise informative error
- Memory: Tree array requires 8 bytes per node (default 4M nodes = 32MB)
- File buffer requires up to maxfilebufsize bytes
- Time complexity: O(n * treedepth) for encoding/decoding n bytes
- Pure Python implementation (no C extensions)
- All arithmetic uses integer operations for reproducibility