Benchmarking RSA Key Generation

jdewerd · on Jan 3, 2025

> The prime-counting function approximation tells us there are Li(x) primes less than x, which works out[5] to one prime every 354 odd integers of 1024 bits.

Rule of thumb: Want a 1024-bit prime? Try 1024 1024-bit candidates and you'll probably find one. Want a 4096-bit prime? Try 4096 4096-bit candidates and you'll probably find one.

The approximate spacing of primes around p is ln(p), so ln(2^1024) = 1024*ln(2), and ln(2)=0.693 so if you are willing to absorb 0.693 into your rule of thumb as a safety margin you get the delightfully simple rule of thumb above. Of course, you'll still want to use a sieve to quickly reject numbers divisible by 2, 3, 5, 7, etc, and this easily rejects 90% of numbers, and then do a Fermat primality test on the remainders (which if you squint is sort of like "try RSA, see if it works"), and then do Miller-Rabin test to really smash down the probability that your candidate isn't prime. The probabilities can be made absurdly small, but it still feels a bit scandalous that the whole thing is probabilistic.

EDIT: updated rule of thumb to reflect random candidate choice rather than sequential candidate choice.

rainsford · on Jan 3, 2025

It's been a while since I've looked at the literature on RSA prime generation, but I seem to remember that picking a random starting point and iterating until you find a prime is discouraged because primes aren't evenly distributed so key generation timing could reveal some information about your starting point and eventual prime choice.

I'm not sure how realistic of an issue this is given the size of the primes involved. Even if an attacker can extract sensitive enough timing information to figure out exactly how many iterations were required to find a 1024 bit prime from a 1204 bit random starting point, I'm not aware of a good way to actually find either value. You do also introduce a bias since you're more likely to select prime numbers without a close neighbor in the direction you are iterating from, but again I'm not sure how practical an attack on this bias would be.

Still, to avoid any potential risk there I seem to remember best practice being to just randomly generate numbers of the right size until you find a prime one. With the speed of modern RNGs, generating a fresh number each time vs iterating doesn't seem like a significant penalty.

jdewerd · on Jan 3, 2025

Yes, excellent point! I originally omitted this detail for simplicity, but on reflection I don't think it actually achieved much in the way of simplifying the rule so I changed it to reflect reality. Thanks for pointing that out.

EDIT: the rush of people offering up sieve optimizations is pushing me back towards formulating the rule of thumb on a consecutive block of numbers, since it makes it very clear that these are not included, rather than implicitly or explicitly including some subset of them (implicit is bad because opacity, explicit is bad because complexity).

maginx · on Jan 3, 2025

I've seen many implementations that relied on iterating (or rather, they used a prime sieve but it amounts to the same thing in terms of the skewed distribution). While maybe not a problem in practice I always hated it - even for embedded systems etc. I always used pure rejection sampling, with a random one in each iteration.

honzaik · on Jan 3, 2025

it might be this https://facthacks.cr.yp.to/fermat.html

if N=p*q and p-q < sqrt(p) then its easy to factor

maginx · on Jan 3, 2025

Encountering this by chance is exceedingly unlikely of course, if p and q are randomly generated. In probability terms it amounts to the first half of p (or q) all being zero (apart from a leading 1) so roughly 2^(-n/4) where n is the bit size of n. So for RSA 2048 the probability of this happening is on the order of 2^-512, or in base-10 terms 0.0000000...0000001, with roughly 150 zeros before the one!

FiloSottile · on Jan 3, 2025

> Rule of thumb: Want a 1024-bit prime? Try 1024 1024-bit candidates and you'll probably find one.

Where probably is 76% [1], which is not that high depending on what you are doing. For example, you wouldn't be ok with GenerateKey failing 24% of the time.

To get a better than even chance, 491 [2] 1024-bit candidates are enough.

[1]: https://www.wolframalpha.com/input?i=1+-+%281+-+li%282%5E102... (using li(x) as a slightly better approximation of π(x) than x/ln(x), see [3])

[2]: https://www.wolframalpha.com/input?i=1+-+%281+-+li%282%5E102...

[3]: https://en.wikipedia.org/wiki/Prime-counting_function

af3d · on Jan 3, 2025

Iterating over some huge search space in an essentially sequential manner is generally not going to be nearly performant as simply selecting an odd number at random. You could try using a generating polynomial instead such as f(x) = x^2 + x + 41 but even that isn't going to help much in the long run. (There are Diophantine equations which one day may prove useful for generating random primes however AFAICT finding efficient solutions is still currently considered a hard problem.)

jdewerd · on Jan 3, 2025

Yes, but the more we mix sieve rejection into candidate selection the more we complicate the rule of thumb. "Reject even numbers as prime candidates" is probably OK to leave as an exercise for the reader, as is the equivalent "round every candidate to odd" optimization. The point about random vs sequential is well taken, though, and it doesn't complicate the rule of thumb, so I changed it.

kevin_thibedeau · on Jan 3, 2025

Incrementing a bignum is faster than another PRNG cycle.

loeg · on Jan 3, 2025

Neither is a significant amount of the time required to reject a candidate factor. The cheapest rejection test is "Bignum Division by 3" and something like 2/3 candidates will need more expensive further tests.

https://news.ycombinator.com/item?id=40093136

JacobiX · on Jan 3, 2025

It is fascinating (to me at least) that almost all RSA implementations rely on a probabilistic primality test, even though the probability of picking a pseudoprime is extremely small.

FiloSottile · on Jan 3, 2025

There's extremely small (like 2⁻²⁰, the chance of winning $50 000 with a $2 Powerball ticket [1]), and then there's cryptographically negligible (like 2⁻¹²⁰, the false negative chance of our primality test). The chance of something cryptographically negligible happening is about the same as the chance of the attacker guessing your key on the first try, or of a cosmic ray flipping a CPU flag inverting the result of "if !signature.verified { return err }".

[1]: https://www.powerball.com/powerball-prize-chart

mras0 · on Jan 3, 2025

I happened to look at this recently, and while I understand the argument (but not the math) of having to do fewer Miller-Rabain rounds, why would you do so in PRACTICAL settings? Unlike ECC you're likely only generating long term keys, so shorter key generation time seems like a bad tradeoff. Composite candidates are going to be rejected early, so you're (with high probability) not doing expensive calculations for most candidates. My reading of [BSI B.5.2](https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publicat...) confirms this.

Of course random bit flips could interfere, but other measures should thwart this in high-stakes environments (at least to some degree).

FiloSottile · on Jan 3, 2025

The number of Miller-Rabin rounds has to be bounded, so if you're not going to base your bound on reaching a cryptographically negligible change of false positives, what are you going to base it on? Should we do 10? 15?

The problem with "x should be enough, but why not do more?" arguments is that they can be applied recursively, and never answer the question "ok so when should we stop?"

mras0 · on Jan 3, 2025

The BSI recommendation I linked says 60 times here (considering the worst case and the security level they're targeting). Just wondering why we'd want to do less for a (presumably rare) one-time operation.

maginx · on Jan 3, 2025

The performance of this can matter in some scenarios. In embedded systems, smart cards etc., generating the primes can take a significant amount of time (15-20 seconds typical) even with the 'low' number of iterations. Higher iterations means the user will have to wait even longer. Actually, in such systems it is not unusual to see occasional time-out errors and such when the smart card is unlucky with finding the primes. The timeout value may be a fixed, general value in a part of the stack difficult to change for the specific call.

Another case is short-term keys/certificates, where a fresh key pair is generated, and cert issues, for each and every signature. This setup makes revocation easier to handle (the certificate typically has a lifetime of a few minutes so it is 'revoked' almost immediately).

There are also scenarios where the keys are generated on a central system (HSM's etc.) to be injected into a production line or similar. There performance can matter as well. I worked on a system where the HSM's were used for this. They could typically generate an RSA key in 1-2 seconds, so 12 HSM's were needed to keep up with demand - and this was again with the reduced number of rounds.

mras0 · on Jan 4, 2025

Thanks for the reply (and it definitely answers my original question). For both short-lived keys, and where production time matters it makes perfect sense. I was mostly thinking about the scenario where you're generating a key for e.g. a long-lived certificate (maybe even a CA) or high-stakes PGP stuff. Just seems like you'd want spend a few more seconds in that case.

FiloSottile · on Jan 3, 2025

Why not do 120 then? We can show that the chance of false negative of 5 rounds is cryptographically negligible, so 5, 60, and 120 are all the same. If the only argument for 60 is that it's more and this is a really important rare operation, doesn't it also apply to 120?

I'm not trying to be glib, there is no rational stopping point if we reject the objective threshold.

mras0 · on Jan 3, 2025

I don't pretend to understand all the involved math, but what I'm trying to say is that as far as I understand T rounds gives 4^-T probability that we've chosen a "bad" prime (really composite) per normal Miller-Rabin in the worst case. Doing ~5 rounds has been shown to be enough to choose a good prime candidate, under most (very probable!) conditions when we get it a random, and thus the argument is that that ~5 rounds is fine. We agree so far?

I'm just asking, why not run the conservative 60 round test, rather than ~5 when you're doing a very rare, one time, key generation? I understand that it's very unlikely to reject any numbers, but at least BSI thinks it's worth it for important keys.

If I understand the recommendation right, you wouldn't do 60 for a 2048 bit key and then 120 for 4096, rather 61 rounds would be enough for 4096 if 120 is alright for 2048.

FiloSottile · on Jan 3, 2025

You're asking why not apply the formula for adversarially selected candidates even if we are randomly selecting candidates. There is simply no reason to, except "maybe we made a mistake" but then why would we not think we made a mistake also in calculating the 1/4 value, or in any other part of the code?

Phrased another way, do you have an argument for why run the conservative 60 round test, instead of asking for an argument for why not run it?

Again, you are "very unlikely" to win the Powerball jackpot. Rounds 6-60 have a cryptographically negligible chance of rejecting a composite. It's different, otherwise we'd have to worry about the "very unlikely" chance of the attacker guessing an AES-128 key on the first try.

(I don't follow you on the key sizes, if you apply the 1/4 probability, the candidate size is irrelevant.)

mras0 · on Jan 4, 2025

Thanks, understand what you mean. I probably botched the 1/4 probability thing, was thinking 4^-60 gives 2^-120 bit assurance (roughly, security margin in my mind), and an extra round would quadruple it, but doesn't work that way I realize.

maginx · on Jan 3, 2025

As I understand it, the number of rounds needed (for a given acceptable failure probability) goes down the larger the number is. Note (very importantly) that this is assuming you are testing a RANDOM integer for primality. If you are given an integer from a potential malicious source you need to do the full number of rounds for the given level.

throw0101c · on Jan 3, 2025

For new deployments/protocols, is there any reason to use RSA? Is RSA (mostly?) about legacy support nowadays?

bux93 · on Jan 3, 2025

For new protocols, go with X25519MLKEM768, then it's quantum proof as well.

daneel_w · on Jan 3, 2025

To my understanding, even with Shor's algorithm, RSA is quantum-resistant when applying sufficiently large numbers such that the quantum computer used to locate the prime must be "sufficiently larger", i.e. a cat and mouse game of sorts until some major breakthrough comes along. I agree with simply moving to Ed25519 and X25519/ML-KEM etc.

honzaik · on Jan 3, 2025

well, it depends on the size of the quantum computer. of course you can make large enough RSA keys (depends whats your security margin/assumptions) but the problem is that the size/computational increase is exponential whereas the solving speed scales polynomially.

https://eprint.iacr.org/2017/351

maginx · on Jan 3, 2025

The size/computational complexity of usage also only grows as a polynomial with RSA. But even with this in mind, a quantum computer that can crack in polynomial time is still problematic. While you can increase the key size, the operator of the quantum computer could enlarge his computer, a true cat-and-mouse game. Unlike the current situation where usage complexity grows as a polynomial, while cracking grows exponentially. This difference is the reason why we can pick key sizes where signing/decryption takes a millisecond while cracking it takes (presumably) billions of years.

upofadown · on Jan 3, 2025

RSA is faster than elliptic curves for signature verification and encryption. It is one of the oldest methods (half a century) which suggests that it could be a good choice when you need the lowest possible chance of some discovered weakness. It can be used for both signing and encryption (but not with the same key pair).

maginx · on Jan 3, 2025

I still see a lot of RSA especially for encryption. While ECC can be used for encryption it is more convoluted and much less supported in hardware, software etc. than RSA decryption/encryption.

wolf550e · on Jan 3, 2025

Ed25519 has been recommended over RSA for many years, post-quantum stuff is recent. RSA should only be used to support old protocols like webpki certificates.

thayne · on Jan 3, 2025

FIPS 140-2 Compliance. FIPS 140-3 adds support for ECC, but it is relatively new, and there aren't a lot of modules that have been certified for it yet, so depending on you your environment and requirements you might still need to use RSA.

Or you are doing a new deployment that needs to be compatible with something that only supports RSA.