46.9x
Apache Logs
vs zstd-19: 28.9x (+62%)
Open Source / Lossless / Zero Dependencies
Structure-aware log compression that beats
zstd -19
by up to 62% through CLP-style deterministic parsing and typed columnar encoding.
46.9x
Apache Logs
vs zstd-19: 28.9x (+62%)
25.5x
HDFS Logs
vs zstd-19: 15.3x (+66%)
22.1x
Linux Syslog
vs zstd-19: 19.5x (+13%)
0%
Override Rate
Lossless by design, not by patching
Benchmarked on LogHub datasets. All ratios verified with byte-perfect lossless roundtrip.
Apache
30.72x -> 46.88x (+53%)
HDFS
16.16x -> 25.49x (+58%)
Linux
14.00x -> 22.09x (+58%)
01
CLP-style single-pass tokenization classifies every character as static text, integer variable, or dictionary variable. No ML clustering. No statistical thresholds. Zero reconstruction failures by design.
Jun 9 06:06:20 combo kernel -> log_type + [9, 06, 06, 20, "combo", "kernel"]
02
Variables are transposed into homogeneous typed columns grouped by schema. All integers from the same template position compress together. Dictionary variables deduplicate into a shared global dictionary.
2,716 Drain3 templates -> 483 deterministic log types on Linux
03
Each column gets a type-specific codec: delta + zigzag varint for integers, digit-preserving encoding for floats, dictionary indices for string variables, and frame-of-reference bitpacking for dense integer sequences.
No override system - every value encodes exactly, or escapes to dictionary
04
All encoded columns are concatenated and compressed with zstd. The pre-processing creates highly regular byte streams that compress dramatically better than raw log text under LZ-family algorithms.
Structure-aware pre-processing + general-purpose compression = best of both worlds
| Dataset | Lines | LogCrush | zstd-3 | zstd-9 | zstd-19 | vs zstd-19 |
|---|---|---|---|---|---|---|
| Apache | 56K | 46.88x | 19.03x | 24.65x | 28.90x | +62% |
| Thunderbird | 3.2M | 57.69x | - | - | - | - |
| HDFS | 5M | 25.49x | 9.98x | 11.93x | 15.32x | +66% |
| Linux | 25K | 22.09x | 12.27x | 16.68x | 19.51x | +13% |
All results verified=True (byte-perfect lossless roundtrip). Datasets from LogHub. Thunderbird zstd baselines pending. Benchmarked on Debian Trixie amd64.