Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB
I built an experiment that uses an overfitted transformer and arithmetic coding to compress individual files. Instead of training the model to generalize, I train a 900KB transformer to memorize a single file and predict the next byte. Those predictions are fed into an arithmetic coder to produce the compressed output. On a 100MB NYC taxi CSV, it compresses to about 7MB (~0.5 bits/byte). On a 100MB slice of enwik9, it compresses to about 21MB (~1.68 bits/byte). It's pretty slow right now (roughly 20–30 minutes of training and 45 minutes each for compression and decompression on my AMD 7800XT). Checkout the repo - https://ift.tt/ObWQp3j 0 comments on Hacker News.
I built an experiment that uses an overfitted transformer and arithmetic coding to compress individual files. Instead of training the model to generalize, I train a 900KB transformer to memorize a single file and predict the next byte. Those predictions are fed into an arithmetic coder to produce the compressed output. On a 100MB NYC taxi CSV, it compresses to about 7MB (~0.5 bits/byte). On a 100MB slice of enwik9, it compresses to about 21MB (~1.68 bits/byte). It's pretty slow right now (roughly 20–30 minutes of training and 45 minutes each for compression and decompression on my AMD 7800XT). Checkout the repo - https://ift.tt/ObWQp3j
I built an experiment that uses an overfitted transformer and arithmetic coding to compress individual files. Instead of training the model to generalize, I train a 900KB transformer to memorize a single file and predict the next byte. Those predictions are fed into an arithmetic coder to produce the compressed output. On a 100MB NYC taxi CSV, it compresses to about 7MB (~0.5 bits/byte). On a 100MB slice of enwik9, it compresses to about 21MB (~1.68 bits/byte). It's pretty slow right now (roughly 20–30 minutes of training and 45 minutes each for compression and decompression on my AMD 7800XT). Checkout the repo - https://ift.tt/ObWQp3j 0 comments on Hacker News.
I built an experiment that uses an overfitted transformer and arithmetic coding to compress individual files. Instead of training the model to generalize, I train a 900KB transformer to memorize a single file and predict the next byte. Those predictions are fed into an arithmetic coder to produce the compressed output. On a 100MB NYC taxi CSV, it compresses to about 7MB (~0.5 bits/byte). On a 100MB slice of enwik9, it compresses to about 21MB (~1.68 bits/byte). It's pretty slow right now (roughly 20–30 minutes of training and 45 minutes each for compression and decompression on my AMD 7800XT). Checkout the repo - https://ift.tt/ObWQp3j
Hacker News story: Overfitted a 900KB Transformer to Compress a 100MB CSV into 7MB
Reviewed by Tha Kur
on
June 23, 2026
Rating:
No comments: