Information


Blog Posts


Collections



Contact


Things Ian Says

Reducing Docker Image Size

There is more attention being paid these days to the size of Docker images, with a desire among many developers to reduce the size as much as possible.

Why is this important? It comes down to the usual points of download and storage needed. A simple example is when your website is experiencing an unusually high demand and you want to spin up some more hosts to handle it. For each new host, you need to download your Docker image to it. If your image is 200MB, it takes 10 times as long to download it as is your image is 20MB.

So, how do we make smaller Docker images? Here are some techniques.

My Vim Image

Being a Docker-savvy kind of person, I try to avoid installing software on my machines and aim to run everything in a Docker container. So, I have built a Docker container which has Vim in it, together with my configuration, code snippets and a set of plug-ins I find useful.

Here’s my Dockerfile for it:

FROM debian

RUN apt-get update && \
    apt-get -y install python git && \
    apt-get -y install build-essential ncurses-dev python-dev

RUN cd /tmp/ && \
    git clone https://github.com/vim/vim.git && \
    cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
    make install

RUN apt-get remove -y ncurses-dev python-dev && \
    rm -rf /tmp/vim

COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips

CMD vim --help

Pretty standard stuff:

  • Base it on Debian
  • Install a bunch of tools I need to build it (it needs Python for the Ultisnips plug-in
  • Pull the latest stable version of vim and build it
  • Remove all the packages I installed to build it
  • Install all the plug-ins I need
  • Copy across a theme and my Ultisnips code fragments

So, my editing setup was pretty sweet. Any time I was on a new machine (or spun up a new VM) I’d just pull my Docker image from Docker Hub and be up and editing, with all my config just how I wanted it.

Except …

It would take ages to pull the image, and it made a dent in the storage on smaller VMs.

So, I had a quick look at the image size:

example/vim       debian              1c600f302423        6 hours ago         664.8 MB

650MB of Docker image? Wow! That’s a full CD’s worth of data. Just for vim.

For comparison, a standard vim install (on CentOS 7) comes in at 890K:

-rwxr-xr-x. 1 root root 889K Jun 10  2014 /bin/vi

So, I decided to see how I could reduce this.

Smaller Linux Distro

The obvious first place to start was the Linux distro I was basing my Docker image on — Debian. If I could find a smaller distro, that would be an easier way of reducing the size. Alpine has been getting a lot of attention as a distro for building Docker images, so I took a quick look at the standard image:

debian            latest              a2ff708b7413        2 days ago          100.1 MB
alpine            latest              665ffb03bfae        3 days ago          3.962 MB

Alpine’s 4MB footprint compared to Debian’s 100MB looks like an easy saving to make. So I made a few changes to the Dockerfile — changed the base image to Alpine, had to change the apt-get package management to Alpine’s apk, and the tools I need to build vim are now in alpine-sdk instead of build-essential — but the Dockerfile is still very recognisable:

FROM alpine

RUN apk update && \
    apk add python git && \
    apk add alpine-sdk ncurses-dev python-dev

RUN cd /tmp/ && \
    git clone https://github.com/vim/vim.git && \
    cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
    make install

RUN apk del alpine-sdk ncurses-dev python-dev && \
    rm -rf /tmp/vim

COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips

CMD vim --help

So, what has that done to my image size?

example/vim       alpine              3d8ac3152685        6 hours ago         426.5 MB
example/vim       debian              1c600f302423        6 hours ago         664.8 MB

That’s pretty good — not only have we shaved off the 95MB difference in base image size, it seems like we’ve got some other savings from what has actually been built too. So, overall we’ve knocked maybe a third off the size of the image.

Let’s see what else we can do.

Temporary Files

In our Dockerfile, we are installing a bunch of tools (like a C compiler, the make command, the ncurses terminal UI library) which we then delete after the build. Perhaps we can handle that better.

The apk package manager has a --nocache option which tells it not to cache the package indexes locally. It also has a --virtual option, which gives control over intermediate software installs. For example, when I install python, apk will install a lot of intermediate dependencies that python depends on. When I delete python later, it doesn’t know about the intermediate dependencies which also need to be deleted. The --virtual option lets me grab those dependencies and I can then also delete them later.

So, adding in --nocache and --virtual, the Dockerfile now looks like this:

FROM alpine

RUN apk --no-cache update && \
    apk --no-cache add python git && \
    apk --no-cache --virtual temp-dependencies add alpine-sdk ncurses-dev python-dev

RUN cd /tmp/ && \
    git clone https://github.com/vim/vim.git && \
    cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
    make install

RUN apk del alpine-sdk ncurses-dev python-dev temp-dependencies && \
    rm -rf /tmp/vim

COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips

CMD vim --help

So, let’s see how much extra that has saved us:

example/vim       nocache             4a7fe1573638        6 hours ago         426.5 MB
example/vim       alpine              3d8ac3152685        6 hours ago         426.5 MB
example/vim       debian              1c600f302423        6 hours ago         664.8 MB

Disappointing! We didn’t save any space at all. Why not?

Docker Layers

To understand this, we need to digress slightly and look at how Docker images are constructed. You will have noticed that when you pull a Docker image (or when you build one), it’s not a single monolithic item. Instead, it is a set of layers. One reason for this, is that layers can be shared across Docker images. You may have noticed this on a pull or build — sometimes you will already have some of the layers in an image.

Each line in your Dockerfile creates a new layer, which essentially contains the differences between the previous layer and the result of that line in your Dockerfile.

You can see the layers and the command by using the docker history command:

IMAGE               CREATED             CREATED BY                                      SIZE
4a7fe1573638        6 hours ago         /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "vim --   0 B
73a7dbe54246        6 hours ago         /bin/sh -c #(nop) ADD dir:596ca9f12dfec3fdbbd   11.88 kB
28d1f40d5840        6 hours ago         /bin/sh -c #(nop) COPY file:ef55daa1a702bdc09   2.074 kB
885fbe2b3de3        6 hours ago         /bin/sh -c mkdir -p /home/ian/.vim/autoload/a   0 B
35b67bc6e36f        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   54.5 kB
b4babda9ad00        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   154.2 kB
49c14700f301        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   163.4 kB
2c29ffbcfd06        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   1.089 MB
373276e4a965        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   394.1 kB
b8b9dd490be2        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   10.77 MB
279de2b82c31        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   5.919 MB
9176e8ae3045        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   74.62 kB
6dd983e0998f        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   985.3 kB
1b57a0c44405        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   40.7 kB
23758616068b        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   2.594 MB
fd26f883f7f6        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   146.7 kB
5f47150b1dd6        6 hours ago         /bin/sh -c mkdir -p /home/ian/.vim/bundle       0 B
2e424782b553        6 hours ago         /bin/sh -c #(nop) COPY file:8975359b99bffecb6   5.422 kB
516efbb2bda5        6 hours ago         /bin/sh -c apk del alpine-sdk ncurses-dev pyt   568.4 kB
f08b4a3680df        6 hours ago         /bin/sh -c cd /tmp/ &&     git clone https://   176.5 MB
80529c2e20f4        6 hours ago         /bin/sh -c apk --no-cache update &&     apk -   223.1 MB
665ffb03bfae        3 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0 B
<missing>           3 days ago          /bin/sh -c #(nop) ADD file:90d7b7a4bad6a39f91   3.962 MB

The bottom line is Alpine Linux itself, with the next line up being the default command to execute (running /bin/sh). The next line up is where we pull all the packages we need, which comes in about 220MB. We then build our vim command, which adds another 176MB to the image. Now comes the interesting part.

When we execute our apk del command to get rid of the packages we no longer need, it doesn’t remove space — it actually adds another 570K to our image. This is because it is adding more information to describe the difference between a layer with our temporary packages and a layer without.

The layer with our temporary packages is still in our Docker image. We can remove another 220MB (over half the size of our Docker image) if we can somehow get rid of that layer.

Restructuring our Dockerfile

I said above that each line in our Dockerfile is responsible for a layer in our Docker image. This is the key to reducing the size of our Docker image. If we delete any temporary files in the same line as we create them, they won’t get written to a layer in the Docker image.

In this example, this is pretty easy to accomplish. Instead of having three separate RUN statements in our Dockerfile, we combine them into one (which does the temporary package installation, the build and the clean up):

FROM alpine

RUN cd /tmp/ && \
    apk --no-cache update && \
    apk --no-cache add python git && \
    apk --no-cache --virtual guzo-dependencies add alpine-sdk ncurses-dev python-dev && \
    git clone https://github.com/vim/vim.git && \
    cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
    make install && \
    apk del alpine-sdk ncurses-dev python-dev guzo-dependencies && \
    rm -rf /tmp/vim

COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips

CMD vim --help

Do we manage to save that 220MB with this technique?

example/vim       latest              4ea4ea72953d        5 hours ago         113.8 MB
example/vim       nocache             4a7fe1573638        6 hours ago         426.5 MB
example/vim       alpine              3d8ac3152685        6 hours ago         426.5 MB
example/vim       debian              1c600f302423        6 hours ago         664.8 MB

Actually, we saved over 300MB. It turns out that the layer where we build vim has reduced from the earlier 177MB to 87MB. We’ve also got rid of the 570K from the deletion of the temporary packages too.

We can see this from the docker history command:

IMAGE               CREATED             CREATED BY                                      SIZE
4ea4ea72953d        6 hours ago         /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "vim --   0 B
6c2a27f94878        6 hours ago         /bin/sh -c #(nop) ADD dir:596ca9f12dfec3fdbbd   11.88 kB
bed0111fee59        6 hours ago         /bin/sh -c #(nop) COPY file:ef55daa1a702bdc09   2.074 kB
9c50ecddeb27        6 hours ago         /bin/sh -c mkdir -p /home/ian/.vim/autoload/a   0 B
666aae7b0a94        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   54.5 kB
1bbe461be3ba        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   154.2 kB
6e4257434128        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   163.4 kB
950822b4ee49        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   1.089 MB
e021d30a05a6        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   394.1 kB
ada4f046f647        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   10.77 MB
8eea52537543        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   5.919 MB
9a2ac03f2693        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   74.62 kB
6074a5564784        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   985.3 kB
bb2d16c05d9e        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   40.7 kB
676704bb1bde        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   2.594 MB
ff6c4c3557c2        6 hours ago         /bin/sh -c cd /home/ian/.vim/bundle && git cl   146.7 kB
a6d43b112ed6        6 hours ago         /bin/sh -c mkdir -p /home/ian/.vim/bundle       0 B
fc9d28b91184        6 hours ago         /bin/sh -c #(nop) COPY file:8975359b99bffecb6   5.422 kB
3f90fb58dd26        6 hours ago         /bin/sh -c cd /tmp/ &&     apk --no-cache upd   87.45 MB
665ffb03bfae        3 days ago          /bin/sh -c #(nop)  CMD ["/bin/sh"]              0 B
<missing>           3 days ago          /bin/sh -c #(nop) ADD file:90d7b7a4bad6a39f91   3.962 MB

Use Smaller Applications

It’s not relevant in this example, since we are actually building an application in our Docker image, but if we need an application in our Docker image (for example a web server) look for smaller alternatives as a way of saving space.

If you need a webserver, nginx is a pretty standard option these days. The Docker image for it comes in at 109MB:

nginx             latest              958a7ae9e569        3 weeks ago         109.4 MB

However, if you have modest webserver needs (for example you might just be serving up a static site), then smaller options are possible. For example, I’ve used the httpd server which is in busybox. The whole busybox distro comes in at just over 1MB:

busybox           latest              c75bebcdd211        5 weeks ago         1.106 MB
nginx             latest              958a7ae9e569        3 weeks ago         109.4 MB

So, by using busybox we have a Docker image 1% the size of nginx. That means that if we are scaling our hosting and need to download a new Docker image, it is 100 times faster to download.

Summary

We have seen four techniques to reduce the size of our Docker images:

  • Use a small distro as a base image
  • Delete temporary resources in the same command as you create them
  • Use optimal package management commands (such as not caching indexes, and identifying intermediate packages)
  • Use smaller applications where possible

The outcome of this was:

  • My vim image was reduced to 17% of its original size (665MB to 114MB)
  • A web server was reduced to 1% of its original size (109MB to 1MB)