Nacrith-GPU compression system running on Modal serverless.
# 1. Authenticate with modal
uvx modal setup
# 2. Compress file
scripts/compress path/to/file.txt
# 3. Decompress file
scripts/decompress file.txt.ncWith a single worker on the GTX 1050 Ti, Nacrith achieves ∼50–70 tokens/second at the start of a file, settling to ∼20–30 tok/s as the KV cache fills to its 2,048-token steady state (attention cost scales linearly with cached positions). With 3 parallel workers (the maximum for 4 GB VRAM), aggregate throughput scales to ∼60–90 tok/s 1
| Method | Per-Item Size | Total Size (20 items) | Notes |
|---|---|---|---|
| Original | 250 KB | 5 MB | Uncompressed baseline |
| Nacrith | 8 KB | 160 KB | |
| Zstd | - | 260 KB | Compressed from 5 MB |
This Modal app runs inference on an H100 which gets ~500-1000 tok/s. As a benchmark, a ~5MB log file (~3M tokens) takes about an hour to compress. 2
There is a batch compression function that breaks text-based file contents into n-chunks and spawn a compress instance to perform compression on each chunk concurrently. The 5MB log file used to benchmark goes from an hour to ~5 minutes to compress.
scripts/batch path/to/file.txt