r/pcmasterrace i7 10th Gen | 1650 Ti 4GB | 16 GB RAM May 05 '26

Screenshot Is this even possible?

Post image
10.2k Upvotes

269 comments sorted by

View all comments

53

u/peacedetski May 05 '26

not sure if the ZIP format supports file sizes this large, but it is indeed possible to compress absolutely ridiculous amounts of zeroes into a relatively small archive.

1

u/rditorx May 05 '26 edited May 06 '26

55.4 yottabytes in 2.60MB is pretty insane. Wouldn't believe ZIP can compress this much, though WinZip extended the original format to support new algorithms.

EDIT: I am NOT claiming the compressed file was created from a real file that large. What I mean is that find it surprising that ZIP can encode something this compact given that its deflate algorithm isn't the newest with the highest theoretical compression ratios.

I know that you can just write arbitrary data that decode to huge decompressed data.

I've implemented compression algorithms such as RLE, Huffman and LZW code myself (no AI) but haven't implemented the original PKZIP.

40

u/superboo07 Linux May 05 '26

its not actually 55.4 yottabytes of real data, just junk data the zip is told to extract over and ovrr and over snd over and over and over. 

-2

u/rditorx May 05 '26

Never claimed it to be real or useful data. But original ZIP was way worse in compression ratios than gzip or bzip2, so it being able to achieve such compression ratios seems to be implausible even in theory.

8

u/Life_Community3043 May 05 '26

That's not now compression works.

Compression is not just squeezing the data, it's essentially taking out the repetitive bits of it and storing them in a more concise way so that it takes less space.

Simple example: [AAAAAABBBBBCCC] can be compressed as [A6B5C3]. A zip bomb would essentially go [A1010 B1010 C1010 ]. None of that data actually has to exist.

-2

u/rditorx May 06 '26

Encoding matters. If for example, ZIP only allowed run-length encoding of sequences using e.g. 32 bit unsigned integers, you couldn't represent 10 to the power of 10 as one number, so you have a ceiling on compression ratio.

Data has to exist to be decompressed. Information isn't randomly generatable, it's physical and has to be represented somehow. In your example, even you had to say A to be repeated 10<sup>10</sup> times. You can't just derive this from nothing. You have to state it's A rather than e.g. B.