Docker Hub is the central repository of images maintained by Docker for its production images. We built the same thing for VMs used by CI pipelines
What are CI base images?
First, let's talk about base images.
Let's say your project has a frontend and a backend, where the frontend uses React, and the backend uses Ruby on Rails.
That means that to run most CI tests, your CI pipeline would need to:
- Install ruby
- Install rails
- Install nodejs
- Install react & other nodejs dependencies
There are built-in base images like FROM ruby
in docker hub, but it's unlikely you'll find the perfect image that has both react and rails preinstalled.
Instead of re-installing react and rails every time, you'd make a base image. Base images are usually docker images, and in this case yours would look like this:
FROM ruby:3.1.2-bullseye
WORKDIR /app
# install nodejs, rails, etc
RUN npm i react@v16 react-dom@v17
You'd manually build that image once and push it to a docker registry as your-company/ci-base-image:v1.0.0
Then, in your CI pipelines, you'd be able to use image: your-company/ci-base-image:v1.0.0
as the image to use.
This approach is called a Dockerized base image, and it's often a great start to speeding up CI pipelines.
Dockerized base images make containers, not VMs
As your stack gets more complicated, it gets harder to stuff everything into a container.
You'd often want to create a base image which contains a database like PostgreSQL, or a DNS server like dnsmasq, or a key/value store like Redis.
In fact, as your projects gets larger and embraces Docker, you'll often find yourself wanting a base image which comes with a docker instance running with its own images pulled.
At webapp.io, we overcame these limitations with a new format called a Layerfile, which is essentially a Dockerfile that builds a VM instead of a container.
Directed acyclic graphs and inheritance
Once you have a base image, you'll often set up CI pipelines that have several steps that inherit and do more complex functionality.
In our version of base images, we facilitate this functionality by allowing the FROM
directive to specify a relative path:
FROM ../base
RUN BACKGROUND npm run start
The "directed acyclic graph" pictured above can then be generated just by looking at which files inherit from which other files.
Topological sorting the dependency graph
In essence, the process of processing these files to create a build graph is as follows:
- Clone the repository
- Find all the files named
Layerfile
- If a layerfile inherits from another, add a dependency link
- Topological sort the graph, so that the base steps run first
This means that just by specifying the correct FROM
directives, we can build an optimal build graph without manually grouping things. Neat!
Building a Docker Hub for base images
Now that we've made a way of creating a graph of Layerfile
s that can be built on demand, all that remains is to add the ability to store configurations in the cloud.
If everything was stored in the repository, there'd be a lot of copy/pasting across repositories - the frontend and backend probably both need the same version of nodejs, and the same shared libraries, for example.
We'd want something that looks like this:
FROM my-org/base:v1.0.0
To find that image in a central store of configurations.
It turns out not to be that hard! All you have to do is make a mapping of key/value pairs to documents. After that, all that's left is to modify our original algorithm to process FROM
directives in one of three ways:
- If
FROM vm/...
, change the operating system type - If
FROM ./
orFROM ../
, resolve the other file relative to this one - Otherwise, search the central repository for a matching image.
Then we can pull the configuration from the cloud and insert it directly into the graph, as if it'd been found in a local file.
Wrap up & results
If you'd like to see a video of the Layerfile library in action, check out this tutorial:
Want to give it a try for yourself? Sign up for free here.