Yes sadly very expensive :( Maybe a select few quants could happen - we're still figuring out what is the most economical and most efficient way to benchmark!
Oh it's more time that's the issue - each benchmark takes 1-3 hours ish to run on 8 GPUs, so running on all quants per model release can be quite painful.
Assume AWS spot say $20/hr B200 for 8 GPUs, then $20 ish per quant, so assuming benchmark is on BF16, 8bit, 6, 5, 4, 3, 2 bits then 7 ish tests so $140 per model ish to $420 ish/hr. Time wise 7 hours to 1 day ish.
We could run them after a model release which might work as well.
Very hard. $$$
The benchmarks are not cheap to run. It'll cost a lot to run them for each quant of each model.