Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I also use it for this purpose, however I have come to hate the way other people use it for this. I have a colleague who has really no idea what he's talking about with respect to machine performance, and who did not have the requisite knowledge of how to peep at the assembly code of a given function with the standard tools like objdump, who now loves to send everyone godbolt links in slack, along with his suppositions about which function will be faster, based entirely on vibes (mostly, instruction count). This drives me up the wall. I wish there was some minimum height needed to ride godbolt.


I go to some pain in my talks to say "instruction count is not a good proxy for performance", but unfortunately folks do still use it. It's handy to say "hey, there's no loop in this output" or "this loop does 3 multiplies; the alternative does two and an add" or similar. It's a mixed blessing to have brought the compiler output to the masses, I can only hope it starts a useful learning process!


It's deceptively harder to interpret assembly nowadays with more and more niche instructions that may take hundreds of clock cycles to execute


So much of performance is memory access now. The instructions can be fine and the access pattern wrecks you.


Yes, indeed.

Now any load from the L3 cache memory or from the main memory takes much more time than any other instruction (not counting exceptions generated by instructions, which include many memory accesses that slow them down, or deprecated instructions that are kept for backwards compatibility and that are executed by long microcode sequences).


Assuming that "caring about performance" = "you are in a tight loop": Here's a tool that simulates/visualizes instruction flow and data dependencies over multiple loop iterations.

https://uica.uops.info/

Paste in assembly code, check "Trace Table" and run, then "Open Trace". Not sure if it will help with your annoying colleague, but it gives a much more concrete idea about how a processor will execute any given code.

Or, if you want to channel their energy into something slightly more direct, there's also https://quick-bench.com/ which allows easy micro-benchmarking. Still not guaranteed to be relevant to any real-world scenario, but more data-driven than "vibes".


CE itself has support for llvm-mca.


Using Compiler Explorer to see how different compilers interpret the same code, or understanding the generated ABI, or if various pragmas are or are not working, etc., is a very good use of it--I suspect most compiler developers at this point more or less have a tab of Compiler Explorer permanently open at this point.

> I have a colleague who has really no idea what he's talking about with respect to machine performance, and who did not have the requisite knowledge of how to peep at the assembly code of a given function with the standard tools like objdump, who now loves to send everyone godbolt links in slack, along with his suppositions about which function will be faster, based entirely on vibes (mostly, instruction count).

This, however, just no.


It's hard to measure performance in realistic situations; it's easy to measure code size. I recently found myself wasting time doing micro-optimizations, encouraged by the feedback loop of measuring and reducing code size (in my case, not with Compiler Explorer, but with "cargo bloat", since I was working on a Rust project.)


I know you know this but instructions are basically free in the face of memory access and random memory access is the worst. Linear scans over contiguous memory (per thread) generally optimizes performance.

Instruction counts are only useful if everything is guaranteed to be in registers.


It can be tough even without the impact of the memory hierarchy. I've seen code where adding an extra instruction to the calculation made it faster. The extra instruction implicitly eliminated denormals, thus resulting in faster execution with some workloads on systems where operations on denormal values were slower than operations on other values.

It was a completely unnecessary instruction from a correctness perspective, because it had no effect on the answer. However, it was important for performance; removing the instruction made the calculation slower.


Damn.

It would be fun to non-destructively randomize the instruction stream and have an ML model learn how to remove hazards.


How $lang maps to assembly is half the picture: how assembly maps to CPU is the other half. We shouldn't blame ignorance of the latter on a tool for exploring the former.

I do totally get how some people learn just enough to be annoying. Generally I still think that's not a good reason to gatekeep them.


objdump sucks, source annotations via coloring makes it at least 5x faster to read assembly, I don’t care how smart you are or how fluent you are in assembly. If your colleague is wrong for other reasons, that’s orthogonal.


Side by side (or coloured, or whatever) source -> assembly annotations is also a really, really efficient way to learn some more assembly.

Write program -> compile -> disassemble w/ some mapping -> make notes -> repeat.

Eventually your brains pattern recognition starts to allow you to do neat things with disassembles of programs without source code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: