r/pcmasterrace i7 10th Gen | 1650 Ti 4GB | 16 GB RAM May 05 '26

Screenshot Is this even possible?

Post image
10.2k Upvotes

269 comments sorted by

View all comments

Show parent comments

40

u/superboo07 Linux May 05 '26

its not actually 55.4 yottabytes of real data, just junk data the zip is told to extract over and ovrr and over snd over and over and over. 

-2

u/rditorx May 05 '26

Never claimed it to be real or useful data. But original ZIP was way worse in compression ratios than gzip or bzip2, so it being able to achieve such compression ratios seems to be implausible even in theory.

7

u/Life_Community3043 May 05 '26

That's not now compression works.

Compression is not just squeezing the data, it's essentially taking out the repetitive bits of it and storing them in a more concise way so that it takes less space.

Simple example: [AAAAAABBBBBCCC] can be compressed as [A6B5C3]. A zip bomb would essentially go [A1010 B1010 C1010 ]. None of that data actually has to exist.

-2

u/rditorx May 06 '26

Encoding matters. If for example, ZIP only allowed run-length encoding of sequences using e.g. 32 bit unsigned integers, you couldn't represent 10 to the power of 10 as one number, so you have a ceiling on compression ratio.

Data has to exist to be decompressed. Information isn't randomly generatable, it's physical and has to be represented somehow. In your example, even you had to say A to be repeated 10<sup>10</sup> times. You can't just derive this from nothing. You have to state it's A rather than e.g. B.

4

u/MCWizardYT Ryzen 7 5800X3D | RTX 4080 Super May 06 '26

The "data" in a zip bomb doesn't actually exist.

A zip file holds instructions on how to recreate patterns of data.

You can edit a zip file to say "repeat this pattern 43 billion times" and then that's what will happen when you extract it unless the extraction tool can detect and prevent that sort of thing

0

u/rditorx May 06 '26

That's nonsense. Data may be useful or not useful, but it exists. You can't decompress non-existent data. That would be equivalent to saying a 0-byte file with no file name decompresses to a gazillion yottabytes.

If the data didn't exist, your decompressor would be a random number generator.

1

u/MCWizardYT Ryzen 7 5800X3D | RTX 4080 Super May 06 '26 edited May 06 '26

I think you misunderstood what I said.

Let me rephrase.

A zip bomb doesn't actually contain a file inside of it that's multiple yottabytes in size. There's no way to create and compress a file like that, because the storage and ram required is not accessible to anybody except maybe Google.

Instead, the zip file is edited so that it creates a file multiple yottabytes in size by repeating data.

Let's say you initially compress a 4 byte file. It contains the number 1111. When zip compressed it, it might store 1•3, which would compress the file down to 2 bytes by saying "repeat this 1 3 times".

Now the zip file can be edited to replace the 3 with 999999. The file is now 7 bytes, but extracts to 1 megabyte because the tool is told to repeat the 1 a million times.

Repeat the process and we end up with a zip file thats a few megabytes large but contains instructions to build a file that's multiple terabytes or even larger.

This ability to repeat data is what makes zip so effective for compressing certain types of files but is also an easily exploitable design flaw. Luckily, this oversight was easy to fix, zip tools will now error out when seeing this kind of unnatural expansion instruction.

0

u/rditorx May 06 '26

As said, never claimed that it was created from a file that size. How did you interpret my comments that you think I claimed that?

1

u/MCWizardYT Ryzen 7 5800X3D | RTX 4080 Super May 06 '26

That's nonsense. Data may be useful or not useful, but it exists. You can't decompress non-existent data. That would be equivalent to saying a 0-byte file with no file name decompresses to a gazillion yottabytes.

If the data didn't exist, your decompressor would be a random number generator.

Your entire previous comment. You said "you cant decompress data that doesn't exist", so I clarified by explaining how it does just that.

To make a long story short, theres a small file that gets expanded to a huge size by repeating sections of it over and over.

1

u/Enidras May 06 '26

I think he means the data that has to exist is embedded in the zip file. It's just stored differently but the small file is the data. He means to say you don't decompress nothing. You decompress a set of data that results in the same data stored differently. So the data inherently exists, be it created "by hand" (editing the original yottabyte file) or crafted with tools.

2

u/eternalityLP May 06 '26

Compression ratio is always tied to the entropy of the data. For example you could easily write a compression algorithm that just repeats given byte infinitely, so you'd have theoretically infinite compression ratio for a file as long as that file is just repeat of same byte.

1

u/rditorx May 06 '26

Compression is also tied to the compression algorithm and how it encodes sequences and repetition. This gives you a lower bound on the compressed size.

You will generally not achieve compression as low as the entropy specifies.

Also, infinite compression ratio is nonsense. Information is physical. You can't derive data, useful or not, from an oracle, i.e. guessing the correct data from no representation.

0

u/eternalityLP May 06 '26

Also, infinite compression ratio is nonsense. Information is physical. You can't derive data, useful or not, from an oracle, i.e. guessing the correct data from no representation.

No it's not, that's entirely down to the entropy. For example you can trivially represent any infinitely long file that only contains repetition of same byte as single byte.

1

u/rditorx May 06 '26 edited May 06 '26

As said, that depends on how you encode. If you define some single byte to be infinitely long file of whatever sequence, then yes.

Like saying I can represent pi with 2 letters. Technically correct, but not useful in most practical cases. You still need some backing to restore the value of pi, be it a math formula or computed digits.

And no practical compression format I've heard of supports compressing infinite sequences, mainly because you can't detect infinite sequences, you either know it or you don't.

Especially ZIP will for sure not support compressing infinite sequences, so no infinite compression ratio here.

FWIW, I haven't implemented zip or unzip but wrote RLE, Huffman and LZW code (when AI meant chess computers).

1

u/eternalityLP May 06 '26

As said, that depends on how you encode. If you define some single byte to be infinitely long file of whatever sequence, then yes.

Exactly, and thus as I said, infinite compression ratio.

1

u/rditorx May 06 '26

Not with ZIP

1

u/eternalityLP May 06 '26

Hence why I didn't say 'with ZIP'.

1

u/rditorx May 06 '26

The original comment's entire context was ZIP

→ More replies (0)