As saagarjha mentions, vectorization of loads and stores is important for memory...

ryao · on Dec 21, 2024

Thanks!

reasonableklout · on Dec 22, 2024

BTW, this part is not entirely true: > Every single instruction is a vector instruction that is executed on all threads in the CUDA block simultaneously (unless it is a no-op due to divergent branching).

It is true that at one point instructions were executed in SIMT lockstep in warps, which are equal-size groups of CUDA cores (with each core mapping to one thread) that subdivide blocks and are the fundamental unit of execution on the hardware.

However, since Volta (2017), the execution model is allowed to move threads in a warp forward in any order, even in the absence of conditional code. Although from what I have seen, for now it appears that threads still move forward in SIMT lockstep and only diverge into active-inactive subsets at branches. That said, there is no guarantee on when the subsets may re-converge, and this is also only behavior which is done for efficiency by the hardware (https://stackoverflow.com/a/58122848/4151721) rather than in order to comply with any published programming model, e.g. it's implementation-specific behavior that could change at any time.

This is why the NVIDIA documentation (https://developer.nvidia.com/blog/using-cuda-warp-level-prim...) says to use __syncwarp() for operations with intra-warp dependencies and to not assume lockstep execution.

saagarjha · on Dec 22, 2024

I've seen divergence across high-latency instructions like memory loads.

upghost · on Dec 23, 2024

Ok, where do you nerds hang out and why am I not there?? I'm loving this discussion, y'all seem to be a rather rare breed of dev though. Where is the community for whatever this sort of dev is called? i.e., Clojure has the Clojurians Slack, the Clojurians Zulip, we have an annual Clojure conference and a few spin-offs. Where do you guys hang out??

This stuff is really awesome and I would love to dig in more!

ryao · on Dec 24, 2024

I have not found a specific hangout to have such discussions. They just organically happened here.