Introduction

Have you ever wanted to compile a big C/C++ project, only to find yourself waiting for hours, while your machine overheated ? Finding out you where missing a cmake option, or editing a header file, and have to recompile half of the project again ?

Caches like ccache or simple sccache setups can help speeding up a build, but only by reusing existing results.

Enter Distributed sccache. sccache has a feature (sccache-dist) that allows you to send your compilation jobs to a cluster of machines, just like the well-known distcc. sccache-dist even goes a step further by automatically uploading your toolchain to the build servers, allowing you to transparently compile from any distribution.

In this article, we will setup a sccache-dist cluster using containers. For this, I will use leftover machines I have, with various configurations/accessibility over Internet.

What distributed compilation is for / isn’t for

Distributed compilation may sound cool, but it also comes with drawback. At equivalent number of cores, a build performed remotely will be longer than a local build (for example, 1m0s vs 0m15s for wget). Distributed compilation really shines with its number of cores (parallel tasks) exceeding what is possible with a single machine: instead of compiling Firefox with 12 cores, you can easily compile it with 100.

My fork

This post will use my soft-fork of sccache here, which adds a lot of quality-of-life patches, for example:

Getting information about build servers with sccache --dist-status and sccache --dist-test-conn
Allowing to disable build server source IP check when it connects to the scheduler (in case build servers are behind a NAT)
persisting statistics to disk
automatically skipping remote compilation of conftests: remote compilation helps for highly parallelized workloads. Sequential conftests (these checking whether the compiler supports GNU C... yes messages) are slower done remotely, as said above.

You can see the whole list of changes here.

This means a lot of the options covered here won’t work by using the upstream Mozilla version.

I provide an image for my fork and upstream sccache here.

Getting started

Note: you should read the official Quickstart as well, as I mainly focus on the pain points I had and the custom options of my fork here.

Here is a broad schema of what we’re trying to accomplish:

  sequenceDiagram
    participant Client
    participant Daemon as Local Daemon
    participant Scheduler as Remote Scheduler
    participant BuildServer as Remote build server X

    Client->>Daemon: 1. Submits job
    Daemon->>Scheduler: 2. Submits job
    Scheduler->>Daemon: 3. allocate on build server 2
    Daemon->>BuildServer: 4. Uploads toolchain + files
    BuildServer->>Daemon: 5. Compiled object
    Daemon->>Client: 6. Compiled object

ℹ️ the sccache documentation refers to both “the daemon” and “the build servers” as “servers”. I make the distinction here: the daemon is the process running locally on the client’s computer, that contains for logic for everything. The build server are the remote workers whose compute will be use for your compilation jobs.

Note: all communications between services can be found here

The scheduler

The scheduler is what manages the build servers’s capacity, and allocate jobs to them. It exposes a HTTP API that needs to be protected from eavesdropping. This means e.g. exposing it over HTTPS, for example at https://sccache.example.com.

You can set up a scheduler by copying and modifying the scheduler service here on one server. In particular, Remove the networks block from it as this it is only needed for the local example. Do not forget to modify its config

The build servers

The build servers are responsible for running the compilation jobs given by the clients. The build servers will be queried by both the scheduler and the client directly, which means you can’t give it a local address, but need a globally accessible one. This is different from e.g. a backend server which only needs to be accessible from the reverse proxy that will forward a request by the client

You can set up build workers by copying and modifying the worker service here on all your servers. In particular, remove the networks block from it as this it is only needed for the local example. Do not forget to modify its config.

In particular, you will need to set the scheduler URL, and the worker’s globally accessible address, that the scheduler will pass to the client & use itself for connecting to that build server.

If the build server is behind a NAT

In some cases, the build server is behind a NAT, which makes it complicated to receive connections. In this case, I used frp to create a tunnel from a server with a public IP to the build server’s port, and advertise that IP from the build server’s public_addr option.
⚠️ Since this means the build server will reach the scheduler from another IP than is used to contact it, you need to disable the scheduler’s mechanism that verify the build server’s source IP by adding check_server_ip = false to the scheduler. This is a bit less secure since it means a stolen JWT (if you use them) on one worker will work from any source IP, but harmless if you use shared tokens anyway.

Is the build server insecure ?

You will note that the build server has a lot of options disabling isolation. This is ironically because the build server is running each compilation in sort of “containers” to protect itself from malicious jobs. And since containers-in-containers is a hard thing to do, this forces us to disable some protections. The current options are the minimal set needed to avoid a full-blown privileged: true.

The client

You need to configure the client to use your scheduler. For that, you check the config template on the official Quickstart.

⚠️ Do not use the upstream sccache client, as it will not work with the forked scheduler and build servers ! You can grab the latest release of the fork client here.

Post-setup steps

Testing everything works

To test your scheduler, you can do sccache --dist-status. This command will tell you if the scheduler is accessible from the client, and if every server has joined correctly.
Here is an exmaple output:

{
  "SchedulerStatus": [
    "https://sccache.example.com",
    {
      "num_servers": 2,
      "num_cpus": 18,
      "in_progress": 0,
      "servers": [
        {
          "address": "192.0.2.1:8080",
          "num_cpus": 12,
          "in_progress": 0
        },
        {
          "address": "192.0.2.2:8080",
          "num_cpus": 6,
          "in_progress": 0
        },
      ]
    }
  ]
}

After this, you can do sccache --dist-test-conn to ensure your client can correctly connect to every build server.

Then, to test an actual compilation, you can do sccache gcc main.c -c. Don’t forget -c, as compilation jobs which include linking (among other) aren’t compatible remotely, and will be done locally instead.

If the compilation has been done remotely successfully, you will see something like

Successful distributed compiles
  192.0.2.1:8080          1

in the output of sccache -s. If not, you may debug the problem by running the daemon in debug mode (see “Troubleshooting”).

The first remote compilation will be slow because sccache will upload your toolchain to the chosen build server. The toolchain is then cached for next calls to that build server.

Tips

If you want to force all build jobs to be done remotely only (to save your client’s compute), you can set force_remote_build = true under [dist] in the client config
to ensure your favorite project build is using sccache, you can install symlinks of the common compilers in your PATH, which will invoke sccache transparently. For example, use sccache --install-bins /usr/local/sccache/bin and add /usr/local/sccache/bin to your PATH before your actual compilers path
You can limit the amount of compute your build servers are using by using the cpus docker compose option, to avoid exhaustion.

Troubleshooting

You can troubleshoot the client by doing SCCACHE_LOG=trace sccache gcc <...>. However, this won’t be very useful as the client just communicates commands to the daemon, which then runs the logic.

You can run the daemon with debug logs by using sccache --stop-server; SCCACHE_LOG=trace SCCACHE_START_SERVER=1 SCCACHE_NO_DAEMON=1 sccache.

Using sccache for distributed C++ compilation

Introduction

What distributed compilation is for / isn’t for

My fork

Getting started

The scheduler

The build servers

The client

Post-setup steps

Testing everything works

Tips

Troubleshooting

Links

Introduction#

What distributed compilation is for / isn’t for#

My fork#

Getting started#

The scheduler#

The build servers#

The client#

Post-setup steps#

Testing everything works#

Tips#

Troubleshooting#

Links#

Introduction

What distributed compilation is for / isn’t for

My fork

Getting started

The scheduler

The build servers

The client

Post-setup steps

Testing everything works

Tips

Troubleshooting

Links