Request access

The Layerfile cache

webapp.io has extended & improved Docker’s caching model for use in CI.

Consider the following Layerfile:

FROM vm/ubuntu:18.04
COPY . .
RUN sleep 20 && cat file1
RUN sleep 20 && cat file2

In this case, we’ll make snapshots after each line and map which files were read back to the snapshots. This means:

Differences from Docker

Here are the major differences between Layerfiles and Dockerfiles for use in CI:

  1. Layerfiles define VMs, not containers - this means you can run anything (including docker) that you could run on a regular cloud server.
  2. Running processes are snapshotted and reused. If you start & populate a database, that’ll be included in the layer so that you don’t have to re-run the steps to set up the database for every pipeline.
  3. COPY in webapp.io does not invalidate the cache when it runs, instead the files copied are monitored for read/write starting at that point. This means that COPY . . is much more common in Layerfiles than Dockerfiles
  4. You can copy files from parent directories (COPY /file1 . or COPY ../.. .) and inherit from other Layerfiles FROM ../../other/Layerfile

File watching COPY

In most CI providers and in Docker, you need to micromanage cache keys. The following Dockerfile and Layerfile are equivalent because we watch which files are read by each step:

FROM ubuntu:18.04
COPY package.json package-lock.json ./
RUN npm install
COPY . .
RUN npm run build
FROM ubuntu:18.04
COPY . .
RUN npm install
RUN npm run build

Instead of micromanaging COPY, you can simply copy the entire repository and we’ll load the bottommost layer from the cache which agrees with a commit’s changes.

Faster installs: The CACHE directive

Sometimes there are steps which will run repeatedly because their constituent files change often, usually source files. Consider this Layerfile:

FROM vm/ubuntu:18.04

RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add - && \\
    echo "deb https://dl.yarnpkg.com/debian/ stable main" > /etc/apt/sources.list.d/yarn.list && \\
    curl -fSsL https://deb.nodesource.com/setup_12.x | bash && \\
    apt-get install nodejs yarn

MEMORY 2G
ENV NODE_OPTIONS=--max-old-space-size=8192

COPY package.json ./
CACHE /usr/local/share/.cache/yarn
RUN npm ci

In this case, unless you change package.json, the default webapp.io cache will skip the entire pipeline after every push.

The CACHE directive only acts to speed up the npm ci step in this case.

Note that CACHE will “leak” state across runs, so it might allow one run to break all following ones until someone force-retries without caches. To avoid this problem, only cache stateless directories (which usually contain “cache” in their paths)

Some other examples: - /var/cache/apt - /root/.cache/go-build - ~/.npm ~/.next/cache ~/.yarn/cache

SPLIT

Parallelizing directive

webapp.io gives an exceedingly useful utility to run tests in parallel - SPLIT 5 duplicates the entire VM 5 times at the point it executes. In practice this means that you can run tests in parallel without worrying about race conditions causing flaky tests.

Rails: knapsack

See knapsack pro

  1. Install the gem
  2. Run KNAPSACK_GENERATE_REPORT=true bundle exec rspec spec on your local computer
  3. git add knapsack_rspec_report.json && git commit -m 'knapsack' && git push origin master

Your Layerfile will look something like this:

# install ruby, bundle install, etc

COPY . .
SPLIT 5
ENV CI_NODE_TOTAL=$SPLIT_NUM CI_NODE_INDEX=$SPLIT
RUN bundle exec knapsack:rspec

Go: custom test runner

See this file for an example parallel test runner for go.

The Layerfile from that example:

FROM ../base/Layerfile

COPY . .
SPLIT 5
RUN ./parallel-go-test.sh

RUN REPEATABLE

Restores state from previous runs

Sometimes it’s not sufficient to just cache directories (CACHE), it’d be best to cache complex state such as running processes or mounted files.

webapp.io provides this powerful but dangerous caching mechanism via RUN REPEATABLE. It’s particularly useful for complicated declarative cluster state like docker, docker-compose and kubectl.

It’s recommended to combine RUN REPEATABLE with multi-stage builds for large performance improvements.

RUN REPEATABLE for Docker

# install docker

COPY . .
RUN REPEATABLE docker build -t myimage
RUN docker run -d -p 8080:8080 myimage

In this Layerfile, the docker cache from previous runs will be reused because RUN REPEATABLE uses the cache from after the last time this step ran.

If you had three pipelines at 9am, 10am, and 11am, the effective steps run would look like this:

In particular, docker would see that it had been used multiple times, and would be able to re-use the docker cache from previous invocations to greatly improve build speed.

RUN REPEATABLE for docker-compose

FROM vm/ubuntu:18.04

RUN apt-get update && \
    apt-get install apt-transport-https ca-certificates curl software-properties-common && \
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - && \
    add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" && \
    apt-get update && \
    apt install docker-ce

RUN curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose && \
    chmod +x /usr/local/bin/docker-compose

COPY . .

RUN REPEATABLE docker-compose up -d --build --force-recreate --remove-orphans && sleep 5

EXPOSE WEBSITE localhost:8000

In this Layerfile, all of these things are reused from the moment immediately after the previous invocation: - The docker layer cache (e.g., pulled images) - Any created networks or volumes

RUN REPEATABLE for kubernetes (kubectl, k8s, k3s)

FROM vm/ubuntu:18.04

# install the latest version of Docker, as in the official Docker installation tutorial.
RUN apt-get update && \\
    apt-get install apt-transport-https ca-certificates curl software-properties-common && \\
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - && \\
    add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" && \\
    apt-get update && \\
    apt install docker-ce

# install & start k3s
RUN curl -sfL https://get.k3s.io | sh -s - --docker

# this script might use helm, kompose, jsonnet, or any other manifest handling logic
RUN REPEATABLE ./build-images-and-manifests.sh && k3s kubectl apply -f dist/manifests --prune

EXPOSE WEBSITE localhost:8000

RUN REPEATABLE gives 50-95% speedups here.

In this Layerfile, we’d set up a kubernetes cluster for you and then snapshot it after you’d started all of your services.

The next time you push, kubernetes’ own declarative logic would figure out which pods to delete/restart given the manifests created. This means that if you had 20 microservices and only changed one, it’d be the only one that is re-deployed with this Layerfile.


Edit these docs