Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Generally speaking they outperform GPUs by an order of magnitude in FLOPS/W*s [2]

The paper you link to is measuring MSPS/W (mega samples per second) and the algorithm they are studying relies on fixed point. It uses built in DSP blocks in the FPGA that are integer only. There is no floating point so it is incorrect to say this shows FPGAs give better FLOPS/W. It isn't all that surprising the FPGAs are doing better, the GPUs are all about floating point which isn't being used here.

Their GPU implementations use floating point as well as int and short. The efficiency barely differs between them showing that this particular GPU wasn't optimising with integer power efficiency in mind (which an FPGA implementation relying on DSP48s very much is).



Altera is claiming they will have >10 TFLOPS next year. They designed floating point DSP blocks in the Arria 10 and Stratix 10 (due out 2016Q1).

https://www.altera.com/content/dam/altera-www/global/en_US/p...


It would be interesting to the see the same experiment repeated using an NVidia Tesla and an Intel Xenon Phi. They used AMD GPUs not targeted at HPC so it's unsurprising the integer path is not power efficient (desktop/mobile graphics is all floating point).


Repeating the experiment with Tesla or Xenon Phi will show you the same thing: that GPUs are less efficient than FPGAs in this load. Their inferior efficiency has nothing to do with whether the polyphase channelization load is integer or floating point. A GPU consists of hundreds or thousands of microprocessors that have a traditional architecture: instruction decoding block, execution engines, registers, etc. Decoding and executing instructions is inherently less power-efficient than having this logic hard-wired as it can be in an FPGA.


> A GPU consists of hundreds or thousands of microprocessors that have a traditional architecture: instruction decoding block, execution engines, registers, etc.

Any example of a GPU with "hundreds or thousands of microprocessors"? Nvidia Titan X has 12 [1] microprocessors by your definition.

[1]: SM, Streaming Multiprocessor in Nvidia's terminology. Smallest unit that can branch, decode instructions, etc.


I am well aware of the technical details and that I used a liberal definition of "microprocessor". My wording was vague on purpose (I didn't want to delve into the details). I didn't mean to imply that each "microprocessor" had their own instruction decoding block (they don't).

An AMD Radeon R9 290X has 2816 stream processors (44 compute units of 64 stream processors) per their terminology. There is only 1 instruction decoder per compute unit, so a stream processor cannot completely branch off independently, but it can still follow a unique code path via branch predication. This is kind of comparable to an Nvidia GPU having "44 streaming multiprocessors".

But whether you call this 44 or 2816 processors is irrelevant to my main point: a processor that has to decode/execute 44 or 2816 instructions in a single cycle while supporting complex features like caching, branching, etc, is going to be less efficient than a FPGA with hard-wired logic (edit: "hard-wired" from the view point of "once the logic has been configured").

gchadwick also said integer workloads were "not power efficient" on GPUs, but that's also false. Most SP floating point and integer instructions on GPUs are optimized to execute in 1 clock cycle, so they are equally optimized. And of course integer logic needs fewer transistors than floating point logic, so an integer operation is going to consume less power than the corresponding floating point operation.


FPGA's don't actually have "hard-wired logic" though - they have a configurable routing fabric that takes up a substantial proportion of the die area and has much worse propagation delays than actual hardwired logic, leading to lower clocks than chips like GPUs. Being able to connect logic together into arbitrary designs at runtime is prerty expensive.


Thanks for pointing it out, I'm so used to FLOP used for benchmarks that I don't even question it anymore - mega samples didn't tick me off as being IP only.


I think AMD GPUs are much better at integer operations. NVIDIA ones are good at floating point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: