Building a "Docker Hub" for CI base images

What are CI base images?

First, let's talk about base images.

Let's say your project has a frontend and a backend, where the frontend uses React, and the backend uses Ruby on Rails.

That means that to run most CI tests, your CI pipeline would need to:

  1. Install ruby
  2. Install rails
  3. Install nodejs
  4. Install react & other nodejs dependencies

There are built-in base images like FROM ruby in docker hub, but it's unlikely you'll find the perfect image that has both react and rails preinstalled.

Instead of re-installing react and rails every time, you'd make a base image. Base images are usually docker images, and in this case yours would look like this:

FROM ruby:3.1.2-bullseye
WORKDIR /app
# install nodejs, rails, etc
RUN npm i react@v16 react-dom@v17

You'd manually build that image once and push it to a docker registry as your-company/ci-base-image:v1.0.0

Then, in your CI pipelines, you'd be able to use image: your-company/ci-base-image:v1.0.0 as the image to use.

This approach is called a Dockerized base image, and it's often a great start to speeding up CI pipelines.

Dockerized base images make containers, not VMs

As your stack gets more complicated, it gets harder to stuff everything into a container.

You'd often want to create a base image which contains a database like PostgreSQL, or a DNS server like dnsmasq, or a key/value store like Redis.

In fact, as your projects gets larger and embraces Docker, you'll often find yourself wanting a base image which comes with a docker instance running with its own images pulled.

At webapp.io, we overcame these limitations with a new format called a Layerfile, which is essentially a Dockerfile that builds a VM instead of a container.

FROM vm/ubuntu:18.04

MEMORY 6G

# Install docker
RUN apt-get install curl apt-transport-https ca-certificates software-properties-common
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
RUN apt update
RUN apt-get install docker-ce=5:20.10*

# Install docker compose
RUN curl -L https://github.com/docker/compose/releases/download/1.29.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
RUN chmod +x /usr/local/bin/docker-compose

# pull a base image that'll be needed for every test
RUN docker-compose pull postgres
A sample of a Layerfile, which looks essentially like a Dockerfile

Directed acyclic graphs and inheritance

Once you have a base image, you'll often set up CI pipelines that have several steps that inherit and do more complex functionality.

An example of a CI pipeline for a web app that starts from a base image

In our version of base images, we facilitate this functionality by allowing the FROM directive to specify a relative path:

FROM ../base

RUN BACKGROUND npm run start

The "directed acyclic graph" pictured above can then be generated just by looking at which files inherit from which other files.

Topological sorting the dependency graph

In essence, the process of processing these files to create a build graph is as follows:

  1. Clone the repository
  2. Find all the files named Layerfile
  3. If a layerfile inherits from another, add a dependency link
  4. Topological sort the graph, so that the base steps run first

This means that just by specifying the correct FROM directives, we can build an optimal build graph without manually grouping things. Neat!

Building a Docker Hub for base images

Now that we've made a way of creating a graph of Layerfiles that can be built on demand, all that remains is to add the ability to store configurations in the cloud.

If everything was stored in the repository, there'd be a lot of copy/pasting across repositories - the frontend and backend probably both need the same version of nodejs, and the same shared libraries, for example.

We'd want something that looks like this:

FROM my-org/base:v1.0.0

To find that image in a central store of configurations.

It turns out not to be that hard! All you have to do is make a mapping of key/value pairs to documents. After that, all that's left is to modify our original algorithm to process FROM directives in one of three ways:

  1. If FROM vm/..., change the operating system type
  2. If FROM ./ or FROM ../, resolve the other file relative to this one
  3. Otherwise, search the central repository for a matching image.

Then we can pull the configuration from the cloud and insert it directly into the graph, as if it'd been found in a local file.

Wrap up & results

If you'd like to see a video of the Layerfile library in action, check out this tutorial:

Layerfile library video tutorial 

Want to give it a try for yourself? Sign up for free here.