15 Oct, 2014

Docker Cache: Friend or Foe?

by Patrick Cooley

Docker is a new container technology that is taking Devops by storm, with many companies moving their applications from running in virtual machines (VMs) over to containers. What does Docker do?

It allows you to run many containers on the same host without them interfering with each other. But unlike VMs that run an entire operating system in each instance, Docker containers can utilize the same host operating system. This way, each container only has the necessary files and tools for its specific task, giving containers and Docker an advantage over traditional VMs. For more information about how to get started with Docker, take a look at this great article: http://learningdocker.com

Above graphic from https://docs.docker.com/terms/layer/

A Docker container is the running instance of a Docker image, and a Docker image is a file system made up of multiple layer images. Docker images are created by parsing the Dockerfile. Each line of the Dockerfile runs by first creating a temporary container and mounting the previous layer images. Then the command runs inside this container and creates a new layer image. The process continues until all lines of the Dockerfile are run.

Once the final Docker image is created, all of the unnecessary intermediate layers and containers are removed. And when you modify a Dockerfile and create a new image, Docker only rebuilds layers that have changed. The rest stay the same. This can be a good thing or a bad thing depending on how you structure your Dockerfile.

Let’s take a look at two Dockerfiles that are configured differently but have the same result. Our Dockerfiles create an image that allows us to connect to the telnet port and watch a Star Wars ASCII animation.

Note: To show the impact that the image cache can play in the final build, I’ve added line 12 to create an empty ~104MB file in the /tmp directory. The file will get removed later when we clean /tmp.

Our first Dockerfile has 15 steps, and we run each command on its own line. This makes the file easy to read but creates some unnecessary steps. If we look at the output of Docker history, we can see the steps used in the Dockerfile. This file creates a total image of 540.1MB.

Our second Dockerfile completes the same task, but this time, we organize our steps in logical sections to minimize the number of layers created.

Because we group multiple commands together, we now only have eight steps. Even though the final image has the same files and completes the same task, it has fewer layers and a smaller total size (423.6MB).

If we look closer at lines 9-15, we see that it downloads source code from github, unzips the file, and moves the code to the app directory. Like the first Dockerfile, it generates a 104MB empty file and empties out the /tmp directory. The difference this time is that the 104MB file is both created and removed within the same layer. So the total image size does not include the 104MB file. In the first Dockerfile, these tasks are done in separate layers, so we have a layer that adds a 104MB file to the image and another layer that deletes it.

Here is our final project:

If you’re interested in running this ASCII movie in Docker, you can download the image I’ve already created.

docker run -d -P pjcoole/swtelnet

CenturyLink has a great article for more details about Docker Image Cache: Working with The Docker Image Cache

If you’re interested in learning more about Docker, come out to B-SidesDC where Mike McCabe and I will demonstrate using Docker as a security tool.