I thought it was generally accepted that inference was faster on TPUs. This was ...

nl · 2025-12-18T05:47:19 1766036839

Sorry I meant Groq custom hardware, not Grok!

I don't see any latency comparisons in the link

danpalmer · 2025-12-18T07:03:01 1766041381

The link is just to the book, the details are scattered throughout. That said the page on GPUs specifically speaks to some of the hardware differences and how TPUs are more efficient for inference, and some of the differences that would lead to lower latency.

https://jax-ml.github.io/scaling-book/gpus/#gpus-vs-tpus-at-...

Re: Groq, that's a good point, I had forgotten about them. You're right they too are doing a TPU-style systolic array processor for lower latency.