Introduction
Have you ever wanted to compile a big C/C++ project, only to find yourself waiting for hours, while your machine overheated ? Finding out you where missing a cmake option, or editing a header file, and have to recompile half of the project again ?
Caches like ccache or simple sccache setups can help speeding up a build, but only by reusing existing results.
Enter Distributed sccache. sccache has a feature (sccache-dist) that allows you to send your compilation jobs to a cluster of machines, just like the well-known distcc. sccache-dist even goes a step further by automatically uploading your toolchain to the build servers, allowing you to transparently compile from any distribution.
In this article, we will setup a sccache-dist cluster using containers. For this, I will use leftover machines I have, with various configurations/accessibility over Internet.
What distributed compilation is for / isn’t for
Distributed compilation may sound cool, but it also comes with drawback. At equivalent number of cores, a build performed remotely will be longer than a local build (for example, 1m0s vs 0m15s for wget). Distributed compilation really shines with its number of cores (parallel tasks) exceeding what is possible with a single machine: instead of compiling Firefox with 12 cores, you can easily compile it with 100.
My fork
This post will use my soft-fork of sccache here, which adds a lot of quality-of-life patches, for example:
- Getting information about build servers with
sccache --dist-statusandsccache --dist-test-conn - Allowing to disable build server source IP check when it connects to the scheduler (in case build servers are behind a NAT)
- persisting statistics to disk
- automatically skipping remote compilation of conftests: remote compilation helps for highly parallelized workloads. Sequential conftests (these
checking whether the compiler supports GNU C... yesmessages) are slower done remotely, as said above.
You can see the whole list of changes here.
This means a lot of the options covered here won’t work by using the upstream Mozilla version.
I provide an image for my fork and upstream sccache here.
Getting started
Note: you should read the official Quickstart as well, as I mainly focus on the pain points I had and the custom options of my fork here.
Here is a broad schema of what we’re trying to accomplish:
sequenceDiagram
participant Client
participant Daemon as Local Daemon
participant Scheduler as Remote Scheduler
participant BuildServer as Remote Build server X
Client->>Daemon: 1. Submits job
Daemon->>Scheduler: 2. Submits job
Scheduler->>Daemon: 3. allocate on build server 2
Daemon->>BuildServer: 4. Uploads toolchain + files
BuildServer->>Daemon: 5. Compiled object
Daemon->>Client: 6. Compiled object
ℹ️ the sccache documentation refers to both “the daemon” and “the build servers” as “servers”. I make the distinction here: the daemon is the process running locally on the client’s computer, that contains for logic for everything. The build server are the remote workers whose compute will be use for your compilation jobs.
Note: all communications between services can be found here
The scheduler
The scheduler is what manages the build servers’s capacity, and allocate jobs to them. It exposes a HTTP API that needs to be protected from eavesdropping. This means e.g. exposing it over HTTPS, for example at https://sccache.example.com.
You can set up a scheduler by copying and modifying the scheduler service here on one server. In particular, Remove the networks block from it as this it is only needed for the local example. Do not forget to modify its config
The build servers
The build servers are responsible for running the compilation jobs given by the clients. The build servers will be queried by both the scheduler and the client directly, which means you can’t give it a local address, but need a globally accessible one. This is different from e.g. a backend server which only needs to be accessible from the reverse proxy that will forward a request by the client
You can set up build workers by copying and modifying the worker service here on all your servers. In particular, remove the networks block from it as this it is only needed for the local example. Do not forget to modify its config.
In particular, you will need to set the scheduler URL, and the worker’s globally accessible address, that the scheduler will pass to the client & use itself for connecting to that build server.
If the build server is behind a NAT
In some cases, the build server is behind a NAT, which makes it complicated to receive connections. In this case, I used frp to create a tunnel from a server with a public IP to the build server’s port, and advertise that IP from the build server’s public_addr option.
⚠️ Since this means the build server will reach the scheduler from another IP than is used to contact it, you need to disable the scheduler’s mechanism that verify the build server’s source IP by adding check_server_ip = false to the scheduler. This is a bit less secure since it means a stolen JWT (if you use them) on one worker will work from any source IP, but harmless if you use shared tokens anyway.
Is the build server insecure ?
You will note that the build server has a lot of options disabling isolation. This is ironically because the build server is running each compilation in sort of “containers” to protect itself from malicious jobs. And since containers-in-containers is a hard thing to do, this forces us to disable some protections. The current options are the minimal set needed to avoid a full-blown privileged: true.
The client
You need to configure the client to use your scheduler. For that, you check the config template on the official Quickstart.
⚠️ Do not use the upstream sccache client, as it will not work with the forked scheduler and build servers ! You can grab the latest release of the fork client here.
Post-setup steps
Testing everything works
To test your scheduler, you can do sccache --dist-status. This command will tell you if the scheduler is accessible from the client, and if every server has joined correctly.
Here is an exmaple output:
{
"SchedulerStatus": [
"https://sccache.example.com",
{
"num_servers": 2,
"num_cpus": 18,
"in_progress": 0,
"servers": [
{
"address": "192.0.2.1:8080",
"num_cpus": 12,
"in_progress": 0
},
{
"address": "192.0.2.2:8080",
"num_cpus": 6,
"in_progress": 0
},
]
}
]
}
After this, you can do sccache --dist-test-conn to ensure your client can correctly connect to every build server.
Then, to test an actual compilation, you can do sccache gcc main.c -c. Don’t forget -c, as compilation jobs which include linking (among other) aren’t compatible remotely, and will be done locally instead.
If the compilation has been done remotely successfully, you will see something like
Successful distributed compiles
192.0.2.1:8080 1
in the output of sccache -s. If not, you may debug the problem by running the daemon in debug mode (see “Troubleshooting”).
The first remote compilation will be slow because sccache will upload your toolchain to the chosen build server. The toolchain is then cached for next calls to that build server.
Tips
- If you want to force all build jobs to be done remotely only (to save your client’s compute), you can set
force_remote_build = trueunder[dist]in the client config - to ensure your favorite project build is using sccache, you can install symlinks of the common compilers in your PATH, which will invoke sccache transparently. For example, use
sccache --install-bins /usr/local/sccache/binand add/usr/local/sccache/binto your PATH before your actual compilers path - You can limit the amount of compute your build servers are using by using the
cpusdocker compose option, to avoid exhaustion.
Troubleshooting
You can troubleshoot the client by doing SCCACHE_LOG=trace sccache gcc <...>. However, this won’t be very useful as the client just communicates commands to the daemon, which then runs the logic.
You can run the daemon with debug logs by using sccache --stop-server; SCCACHE_LOG=trace SCCACHE_START_SERVER=1 SCCACHE_NO_DAEMON=1 sccache.