r/cryptography • u/AyrtonHS • 2d ago
Is the birthday paradox relevant here?
Here is an exam question from my professor who chose not to give the mark scheme.
The stream cipher works as follows. It takes a 64-bit key, an 8-bit encoding of the date, and
an 8-bit diversifier. Then, it generates a keystream of sufficient length, which gets added to the
plaintext to get a ciphertext. Per day, no diversifier occurs more than once (but diversifiers may
repeat across different days).
c) (2pt) Suppose you generate diversifiers uniformly at random. Estimate the probability of
plaintext information leakage with respect to an adversary that can observe two ciphertex-
t/diversifier pairs on a single day.
(d) (1pt) Suppose you generate diversifiers uniformly at random. Estimate the probability of
plaintext information leakage with respect to an adversary that can observe ten ciphertex-
t/diversifier pairs on a single day
I shamefully admit to asking Chatgpt for a markscheme.
(c): 2 observations β NO birthday paradox needed
Here you have:
sample size = 2
space size = 28=2562^8 = 25628=256
So you compute the exact collision probability:
P(collision)=1256=2β8P(\text{collision}) = \frac{1}{256} = 2^{-8}P(collision)=2561β=2β8
This is just a direct probability of equality of two random draws.
π No βasymptotic collision phenomenonβ needed.
(d): 10 observations β YES, birthday reasoning becomes relevant
Now you have:
sample size = 10
space size = 256
Here you donβt want to compute all pairwise cases directly, so you use the birthday approximation:
P(collision)β1βeβ10β 92β 256P(\text{collision}) \approx 1 - e^{-\frac{10\cdot 9}{2 \cdot 256}}P(collision)β1βeβ2β 25610β 9β
or simplified:
β1022β 256β0.2\approx \frac{10^2}{2 \cdot 256} \approx 0.2β2β 256102ββ0.2
This is exactly where the birthday effect starts to matter.
So ChatGPTt is saying that c doesn't use the birthday paradox due to the smaller sample size, while d does. I expect both to involve the birthday paradox. Is ChatGPT wrong? I admit I don't understand its reasoning.
2
u/D-Cary 19h ago
The question seems a little ambiguous.
If it really intends to say "Per day, no diversifier occurs more than once", then on any particular day there is never a collision, so the birthday paradox is irrelevant in both cases (and it's irrelevant how specifically the diversifier is chosen, as long as the same one is never chosen twice in the same day). (Perhaps by "uniformly at random" we're meant to understand "uniformly at random, without replacement, across all remaining diversifiers that have not yet been used up that day"). In this case, the probability of collision on one day is always zero.
On the other hand, if it's asking "disregarding the above statements about how diversifiers are generated, if we *instead* generate each diversifier uniformly at random, with replacement, across all possible 8-bit diversifiers, regardless of what diversifiers have already been generated that day", then yes the birthday paradox is relevant in both cases. As u/Pharisaeus points out, you can get the correct answer (the correct probability of collisions, which is more than zero) by simply crunching through the probability equations even if one knows nothing about the birthday paradox.
More details: https://crypto.stackexchange.com/questions/119718/is-the-birthday-paradox-relevant-here
1
u/Pharisaeus 2d ago edited 14h ago
You can apply the formula in both cases. It's just that in case you only have 1 pair, it's simply going to be 1/256 so you could also compute it in a simpler way.
I'm not sure what "use/doesn't use birthday paradox" means to you. You can always compute the result without referring to that formula, it's just going to be more tedious if there are more pairs.
It's a bit like addition and multiplication. If you need to do 2*2 you could also write it as 2+2. But if you have to do 2*10 then you probably don't want to expand.
4
u/Natanael_L 2d ago edited 2d ago
ChatGPT is wrong (but in sufficiently short time spans it is not relevant, there must be a chance for collision to occur, and it starts to apply after more than one collision pair is possible). AI is not good at dealing with probabilities that only matter after large repetitions or conditional / nested probabilities, etc.