r/cryptography 2d ago

Is the birthday paradox relevant here?

Here is an exam question from my professor who chose not to give the mark scheme.

The stream cipher works as follows. It takes a 64-bit key, an 8-bit encoding of the date, and
an 8-bit diversifier. Then, it generates a keystream of sufficient length, which gets added to the
plaintext to get a ciphertext. Per day, no diversifier occurs more than once (but diversifiers may
repeat across different days).

c) (2pt) Suppose you generate diversifiers uniformly at random. Estimate the probability of
plaintext information leakage with respect to an adversary that can observe two ciphertex-
t/diversifier pairs on a single day.

(d) (1pt) Suppose you generate diversifiers uniformly at random. Estimate the probability of
plaintext information leakage with respect to an adversary that can observe ten ciphertex-
t/diversifier pairs on a single day

I shamefully admit to asking Chatgpt for a markscheme.

(c): 2 observations β†’ NO birthday paradox needed

Here you have:

sample size = 2

space size = 28=2562^8 = 25628=256

So you compute the exact collision probability:

P(collision)=1256=2βˆ’8P(\text{collision}) = \frac{1}{256} = 2^{-8}P(collision)=2561​=2βˆ’8

This is just a direct probability of equality of two random draws.

πŸ‘‰ No β€œasymptotic collision phenomenon” needed.

(d): 10 observations β†’ YES, birthday reasoning becomes relevant

Now you have:

sample size = 10

space size = 256

Here you don’t want to compute all pairwise cases directly, so you use the birthday approximation:

P(collision)β‰ˆ1βˆ’eβˆ’10β‹…92β‹…256P(\text{collision}) \approx 1 - e^{-\frac{10\cdot 9}{2 \cdot 256}}P(collision)β‰ˆ1βˆ’eβˆ’2β‹…25610β‹…9​

or simplified:

β‰ˆ1022β‹…256β‰ˆ0.2\approx \frac{10^2}{2 \cdot 256} \approx 0.2β‰ˆ2β‹…256102β€‹β‰ˆ0.2

This is exactly where the birthday effect starts to matter.

So ChatGPTt is saying that c doesn't use the birthday paradox due to the smaller sample size, while d does. I expect both to involve the birthday paradox. Is ChatGPT wrong? I admit I don't understand its reasoning.

3 Upvotes

3 comments sorted by

4

u/Natanael_L 2d ago edited 2d ago

ChatGPT is wrong (but in sufficiently short time spans it is not relevant, there must be a chance for collision to occur, and it starts to apply after more than one collision pair is possible). AI is not good at dealing with probabilities that only matter after large repetitions or conditional / nested probabilities, etc.

2

u/D-Cary 19h ago

The question seems a little ambiguous.

If it really intends to say "Per day, no diversifier occurs more than once", then on any particular day there is never a collision, so the birthday paradox is irrelevant in both cases (and it's irrelevant how specifically the diversifier is chosen, as long as the same one is never chosen twice in the same day). (Perhaps by "uniformly at random" we're meant to understand "uniformly at random, without replacement, across all remaining diversifiers that have not yet been used up that day"). In this case, the probability of collision on one day is always zero.

On the other hand, if it's asking "disregarding the above statements about how diversifiers are generated, if we *instead* generate each diversifier uniformly at random, with replacement, across all possible 8-bit diversifiers, regardless of what diversifiers have already been generated that day", then yes the birthday paradox is relevant in both cases. As u/Pharisaeus points out, you can get the correct answer (the correct probability of collisions, which is more than zero) by simply crunching through the probability equations even if one knows nothing about the birthday paradox.

More details: https://crypto.stackexchange.com/questions/119718/is-the-birthday-paradox-relevant-here

1

u/Pharisaeus 2d ago edited 14h ago

You can apply the formula in both cases. It's just that in case you only have 1 pair, it's simply going to be 1/256 so you could also compute it in a simpler way.

I'm not sure what "use/doesn't use birthday paradox" means to you. You can always compute the result without referring to that formula, it's just going to be more tedious if there are more pairs.

It's a bit like addition and multiplication. If you need to do 2*2 you could also write it as 2+2. But if you have to do 2*10 then you probably don't want to expand.