When using the disk cache, ccache is still faster on cache hits due to also checking the hash of all input files (called a manifest) before even executing the preprocessor. It also can just clone/hardlink the file in the cache instead of copying it.
I was a very happy user of sccache - it took some big CI builds from ~10 to 1.5 minutes on average. We had to add an Azure backend to it, but the code is very well-organized and it was pretty easy to hack on.
I don't work in native languages these days, but if I do again I'll definitely reach for this again.
There's a growing list of projects written in Rust that couldn't benefit by sccache. It would be helpful if people made clear what sccache was not good for so that people can stop spending needless hours re-discovering on their own.
I don't keep a list, but did manage to remember the following instance. The Fluvio team blogged about sccache benefits and within just a couple months following this blog post [1] removed sccache from their builds [2]. Note the comment in the github issue: "Looks like removing sccache either improve or no impact at all to performance..."
It would be interesting to have this at Internet scale, everyone on the planet who is building software would share hashes of the code they built, the binaries they built and the compiler details.
I had a similar idea in the past of a distributed compiler that scales across multiple machines to improve build times. This is great work and excited to see it becoming more prominent.
For me the existence of "build caching" schemes is indicative that something's wrong with the tool chain or its users and that modularity hasn't been properly implemented.
While build caching could help mask problems caused by poor modularity, such as the same source file being built multiple times in different subdirectories of a build, rather than just once, that's really not what its for.
It solves the toolchain problem that the toolchain doesn't remember that it's already built something before; if you give it the same inputs, it will compile them every time, taking the same time as before.
Caching lets you do a clean rebuild in a newly spun up environment with a new checkout of the source code, while saving time by re-using pieces that have not changed from another build (not necessarily identical to that one).
Yes, there could be less of a need for caching if incremental builds were rigorously reliable. Every instance of a CI server could then just update the same repository in-place with new commits, and run an incremental build. But caching would still help with that. For instance, if a commit happens to revert a file to a prior state, caching will pick up on that and pull out the prior object file for that.
When you use caching in a private repository where you have reliable incremental builds, you still see an improvement. For instance, when you throw away some experimental code, returning files to a prior state, and run a build, the object files just come blazing out of the cache.
When you do a "git bisect" to find a bug, same thing: the old commits build really fast.
While I'm less absolutist about the matter than dboreham seems to be, I feel the need to point out that a clean rebuild by definition doesn't have anything cached - that's what "clean" means. If the build environment has access to data from a previous build, it's not clean.
It's useful (perhaps because "something's wrong with the tool chain or its users", or perhaps for legitimate reasons) to have a cached build as a intermediate step between a incremental build and a full (and slow) clean rebuild, but it is between.
Wiyh reliable caching schemes you don't need "clean" builds. Bazel and co (used by Google, FB, Twitter, etc) all use distributed build machines w/ caches.
The idea that you can't trust your incremental build and so need to discard it occasionally is a deep failure of how most incremental builds are done (mtime) and not a fundamental flaw in caching.
I don't think that's quite true. If you can perfectly encapsulate your compile environment and ensure that you explicitly capture all inputs, then you can store cached artifacts by the hash of their inputs and be confident that you will always build the exact same thing. Of course, at that point we're more talking about nix than *cache.
It's worth noting that sccache is specifically designed to support sharing caches across machines, not just locally. I don't really see what the benefit of a language having first-class support for using a cache from S3 or whatever instead of it just being a third party tool.
Compilation can be a big overhead on C++ codebases even when there is plenty of care in regards to modularity. Projects that are heavy on templates usually benefit a lot from compile caching mechanisms.
When using the disk cache, ccache is still faster on cache hits due to also checking the hash of all input files (called a manifest) before even executing the preprocessor. It also can just clone/hardlink the file in the cache instead of copying it.