Sccache – Shared Compilation Cache

meinersbur · on May 12, 2022

My first contribution to a Rust codebase was <https://github.com/mozilla/sccache/commit/da2934fcc2ed2a4ae2...>. It is adding the -fminimize-whitespace flag to the preprocessor command when available (New in Clang 14, <https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNot...>). The equivalent ccache patch is still pending <https://github.com/ccache/ccache/pull/815>.

When using the disk cache, ccache is still faster on cache hits due to also checking the hash of all input files (called a manifest) before even executing the preprocessor. It also can just clone/hardlink the file in the cache instead of copying it.

maxfan8 · on May 12, 2022

Hi, Michael! Nice to see you on HN and talking about that PR :).

pianoben · on May 12, 2022

I was a very happy user of sccache - it took some big CI builds from ~10 to 1.5 minutes on average. We had to add an Azure backend to it, but the code is very well-organized and it was pretty easy to hack on.

I don't work in native languages these days, but if I do again I'll definitely reach for this again.

Dowwie · on May 13, 2022

There's a growing list of projects written in Rust that couldn't benefit by sccache. It would be helpful if people made clear what sccache was not good for so that people can stop spending needless hours re-discovering on their own.

Kinrany · on May 17, 2022

Any specific examples?

Dowwie · on May 20, 2022

I don't keep a list, but did manage to remember the following instance. The Fluvio team blogged about sccache benefits and within just a couple months following this blog post [1] removed sccache from their builds [2]. Note the comment in the github issue: "Looks like removing sccache either improve or no impact at all to performance..."

[1] https://dev.to/infinyon/github-actions-best-practices-for-ru... [2] https://github.com/infinyon/fluvio/pull/1229

redrobein · on May 13, 2022

Also checkout https://github.com/dimensionhq/fleet that uses this and other techniques to do really fast rust builds.

pabs3 · on May 13, 2022

It would be interesting to have this at Internet scale, everyone on the planet who is building software would share hashes of the code they built, the binaries they built and the compiler details.

throwamon · on May 13, 2022

If you haven't already, check out Nix/Guix.

encryptluks2 · on May 13, 2022

Also, reproducible-builds.org. Arch has most of its packages as reproducible and it integrates into pacman:

https://archlinux.org/packages/community/x86_64/pacman-bintr...

Note, I don't recommend this for haskell packaging on Arch as it slows things down substantially.

CMCDragonkai · on May 13, 2022

For security, you want to only do this with content addressed builds.

We have a company nix cache on S3 overlaid on top of upstream nixpkgs cache where only the CICD is allowed to push/update the cache.

Employee computers are all read-only.

Works great!

encryptluks2 · on May 13, 2022

I had a similar idea in the past of a distributed compiler that scales across multiple machines to improve build times. This is great work and excited to see it becoming more prominent.

pabs3 · on May 13, 2022

I hope the things being built with this are reproducible.

https://reproducible-builds.org/

dboreham · on May 12, 2022

For me the existence of "build caching" schemes is indicative that something's wrong with the tool chain or its users and that modularity hasn't been properly implemented.

kazinator · on May 12, 2022

While build caching could help mask problems caused by poor modularity, such as the same source file being built multiple times in different subdirectories of a build, rather than just once, that's really not what its for.

It solves the toolchain problem that the toolchain doesn't remember that it's already built something before; if you give it the same inputs, it will compile them every time, taking the same time as before.

Caching lets you do a clean rebuild in a newly spun up environment with a new checkout of the source code, while saving time by re-using pieces that have not changed from another build (not necessarily identical to that one).

Yes, there could be less of a need for caching if incremental builds were rigorously reliable. Every instance of a CI server could then just update the same repository in-place with new commits, and run an incremental build. But caching would still help with that. For instance, if a commit happens to revert a file to a prior state, caching will pick up on that and pull out the prior object file for that.

When you use caching in a private repository where you have reliable incremental builds, you still see an improvement. For instance, when you throw away some experimental code, returning files to a prior state, and run a build, the object files just come blazing out of the cache.

When you do a "git bisect" to find a bug, same thing: the old commits build really fast.

a1369209993 · on May 12, 2022

> a clean rebuild

While I'm less absolutist about the matter than dboreham seems to be, I feel the need to point out that a clean rebuild by definition doesn't have anything cached - that's what "clean" means. If the build environment has access to data from a previous build, it's not clean.

It's useful (perhaps because "something's wrong with the tool chain or its users", or perhaps for legitimate reasons) to have a cached build as a intermediate step between a incremental build and a full (and slow) clean rebuild, but it is between.

joshuamorton · on May 13, 2022

Wiyh reliable caching schemes you don't need "clean" builds. Bazel and co (used by Google, FB, Twitter, etc) all use distributed build machines w/ caches.

The idea that you can't trust your incremental build and so need to discard it occasionally is a deep failure of how most incremental builds are done (mtime) and not a fundamental flaw in caching.

yjftsjthsd-h · on May 12, 2022

I don't think that's quite true. If you can perfectly encapsulate your compile environment and ensure that you explicitly capture all inputs, then you can store cached artifacts by the hash of their inputs and be confident that you will always build the exact same thing. Of course, at that point we're more talking about nix than *cache.

lttlrck · on May 13, 2022

> If the build environment has access to data from a previous build, it's not clean.

That strict definition makes using dependencies very hard. Pre-built libraries, npm, ...

encryptluks2 · on May 13, 2022

A clean rebuild doesn't mean that it can't used cached objects. If the cache is the same regardless, the final output will be the same.

saghm · on May 12, 2022

It's worth noting that sccache is specifically designed to support sharing caches across machines, not just locally. I don't really see what the benefit of a language having first-class support for using a cache from S3 or whatever instead of it just being a third party tool.

jonstewart · on May 12, 2022

Have you worked in a compiled language?

krimbus · on May 12, 2022

Compilation can be a big overhead on C++ codebases even when there is plenty of care in regards to modularity. Projects that are heavy on templates usually benefit a lot from compile caching mechanisms.