OK, but you're diving deep into arcane abstraction there. All the tricks that modern processor architectures do for speed are only relevant to your understanding if you are designing microprocessors (which everyone should also try, we did it in college, and it was a blast).
At that level, even the compiled opcodes aren't strictly demonstrative of what's happening inside the processor, but you'd only know that if you were running a simulator and watching traces light up.
Well, that's a part of it but it can be relevant to performance. Cache behavior and branch prediction aren't very visible in C. There's also a lot of goings on with memory where C just says, "Uh, unspecified!", particularly with multiple cores and/or cpus.
Yes and no, there are no way in plain C to excert direct control of branch prediction or cache usage, however that's where GCC extensions come in.
Extensions such as __builtin_prefetch, __builtin_expect are heavily used in the Linux kernel to allow a higher level of optimization by directly instructing the compiler how to handle branch prediction and caching (based upon careful benchmarking) in performance critical areas rather than leaving it up to the compiler's 'compile-time' heuristics.
At that level, even the compiled opcodes aren't strictly demonstrative of what's happening inside the processor, but you'd only know that if you were running a simulator and watching traces light up.