Docker Hub is the central repository of images maintained by Docker for its production images. We built the same thing for VMs used by CI pipelines
What are CI base images?
First, let's talk about base images.
Let's say your project has a frontend and a backend, where the frontend uses React, and the backend uses Ruby on Rails.
That means that to run most CI tests, your CI pipeline would need to:
- Install ruby
- Install rails
- Install nodejs
- Install react & other nodejs dependencies
There are built-in base images like
FROM ruby in docker hub, but it's unlikely you'll find the perfect image that has both react and rails preinstalled.
Instead of re-installing react and rails every time, you'd make a base image. Base images are usually docker images, and in this case yours would look like this:
FROM ruby:3.1.2-bullseye WORKDIR /app # install nodejs, rails, etc RUN npm i [email protected] [email protected]
You'd manually build that image once and push it to a docker registry as
Then, in your CI pipelines, you'd be able to use
image: your-company/ci-base-image:v1.0.0 as the image to use.
This approach is called a Dockerized base image, and it's often a great start to speeding up CI pipelines.
Dockerized base images make containers, not VMs
As your stack gets more complicated, it gets harder to stuff everything into a container.
You'd often want to create a base image which contains a database like PostgreSQL, or a DNS server like dnsmasq, or a key/value store like Redis.
In fact, as your projects gets larger and embraces Docker, you'll often find yourself wanting a base image which comes with a docker instance running with its own images pulled.
At webapp.io, we overcame these limitations with a new format called a Layerfile, which is essentially a Dockerfile that builds a VM instead of a container.
Directed acyclic graphs and inheritance
Once you have a base image, you'll often set up CI pipelines that have several steps that inherit and do more complex functionality.
In our version of base images, we facilitate this functionality by allowing the
FROM directive to specify a relative path:
FROM ../base RUN BACKGROUND npm run start
The "directed acyclic graph" pictured above can then be generated just by looking at which files inherit from which other files.
Topological sorting the dependency graph
In essence, the process of processing these files to create a build graph is as follows:
- Clone the repository
- Find all the files named
- If a layerfile inherits from another, add a dependency link
- Topological sort the graph, so that the base steps run first
This means that just by specifying the correct
FROM directives, we can build an optimal build graph without manually grouping things. Neat!
Building a Docker Hub for base images
Now that we've made a way of creating a graph of
Layerfiles that can be built on demand, all that remains is to add the ability to store configurations in the cloud.
If everything was stored in the repository, there'd be a lot of copy/pasting across repositories - the frontend and backend probably both need the same version of nodejs, and the same shared libraries, for example.
We'd want something that looks like this:
To find that image in a central store of configurations.
It turns out not to be that hard! All you have to do is make a mapping of key/value pairs to documents. After that, all that's left is to modify our original algorithm to process
FROM directives in one of three ways:
FROM vm/..., change the operating system type
FROM ../, resolve the other file relative to this one
- Otherwise, search the central repository for a matching image.
Then we can pull the configuration from the cloud and insert it directly into the graph, as if it'd been found in a local file.
Wrap up & results
If you'd like to see a video of the Layerfile library in action, check out this tutorial:
Want to give it a try for yourself? Sign up for free here.