Reducing Docker Image Size
There is more attention being paid these days to the size of Docker images, with a desire among many developers to reduce the size as much as possible.
Why is this important? It comes down to the usual points of download and storage needed. A simple example is when your website is experiencing an unusually high demand and you want to spin up some more hosts to handle it. For each new host, you need to download your Docker image to it. If your image is 200MB, it takes 10 times as long to download it as is your image is 20MB.
So, how do we make smaller Docker images? Here are some techniques.
My Vim Image
Being a Docker-savvy kind of person, I try to avoid installing software on my machines and aim to run everything in a Docker container. So, I have built a Docker container which has Vim in it, together with my configuration, code snippets and a set of plug-ins I find useful.
Here’s my Dockerfile for it:
FROM debian
RUN apt-get update && \
apt-get -y install python git && \
apt-get -y install build-essential ncurses-dev python-dev
RUN cd /tmp/ && \
git clone https://github.com/vim/vim.git && \
cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
make install
RUN apt-get remove -y ncurses-dev python-dev && \
rm -rf /tmp/vim
COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips
CMD vim --help
Pretty standard stuff:
- Base it on Debian
- Install a bunch of tools I need to build it (it needs Python for the Ultisnips plug-in
- Pull the latest stable version of vim and build it
- Remove all the packages I installed to build it
- Install all the plug-ins I need
- Copy across a theme and my Ultisnips code fragments
So, my editing setup was pretty sweet. Any time I was on a new machine (or spun up a new VM) I’d just pull my Docker image from Docker Hub and be up and editing, with all my config just how I wanted it.
Except …
It would take ages to pull the image, and it made a dent in the storage on smaller VMs.
So, I had a quick look at the image size:
example/vim debian 1c600f302423 6 hours ago 664.8 MB
650MB of Docker image? Wow! That’s a full CD’s worth of data. Just for vim.
For comparison, a standard vim install (on CentOS 7) comes in at 890K:
-rwxr-xr-x. 1 root root 889K Jun 10 2014 /bin/vi
So, I decided to see how I could reduce this.
Smaller Linux Distro
The obvious first place to start was the Linux distro I was basing my Docker image on — Debian. If I could find a smaller distro, that would be an easier way of reducing the size. Alpine has been getting a lot of attention as a distro for building Docker images, so I took a quick look at the standard image:
debian latest a2ff708b7413 2 days ago 100.1 MB
alpine latest 665ffb03bfae 3 days ago 3.962 MB
Alpine’s 4MB footprint compared to Debian’s 100MB looks like an easy saving to
make. So I made a few changes to the Dockerfile — changed the base image to
Alpine, had to change the apt-get
package management to Alpine’s apk
, and
the tools I need to build vim are now in alpine-sdk
instead of
build-essential
— but the Dockerfile is still very recognisable:
FROM alpine
RUN apk update && \
apk add python git && \
apk add alpine-sdk ncurses-dev python-dev
RUN cd /tmp/ && \
git clone https://github.com/vim/vim.git && \
cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
make install
RUN apk del alpine-sdk ncurses-dev python-dev && \
rm -rf /tmp/vim
COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips
CMD vim --help
So, what has that done to my image size?
example/vim alpine 3d8ac3152685 6 hours ago 426.5 MB
example/vim debian 1c600f302423 6 hours ago 664.8 MB
That’s pretty good — not only have we shaved off the 95MB difference in base image size, it seems like we’ve got some other savings from what has actually been built too. So, overall we’ve knocked maybe a third off the size of the image.
Let’s see what else we can do.
Temporary Files
In our Dockerfile, we are installing a bunch of tools (like a C compiler, the
make
command, the ncurses
terminal UI library) which we then delete after
the build. Perhaps we can handle that better.
The apk
package manager has a --nocache
option which tells it not to cache
the package indexes locally. It also has a --virtual
option, which gives
control over intermediate software installs. For example, when I install
python
, apk will install a lot of intermediate dependencies that python
depends on. When I delete python
later, it doesn’t know about the
intermediate dependencies which also need to be deleted. The --virtual
option lets me grab those dependencies and I can then also delete them later.
So, adding in --nocache
and --virtual
, the Dockerfile now looks like this:
FROM alpine
RUN apk --no-cache update && \
apk --no-cache add python git && \
apk --no-cache --virtual temp-dependencies add alpine-sdk ncurses-dev python-dev
RUN cd /tmp/ && \
git clone https://github.com/vim/vim.git && \
cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
make install
RUN apk del alpine-sdk ncurses-dev python-dev temp-dependencies && \
rm -rf /tmp/vim
COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips
CMD vim --help
So, let’s see how much extra that has saved us:
example/vim nocache 4a7fe1573638 6 hours ago 426.5 MB
example/vim alpine 3d8ac3152685 6 hours ago 426.5 MB
example/vim debian 1c600f302423 6 hours ago 664.8 MB
Disappointing! We didn’t save any space at all. Why not?
Docker Layers
To understand this, we need to digress slightly and look at how Docker images are constructed. You will have noticed that when you pull a Docker image (or when you build one), it’s not a single monolithic item. Instead, it is a set of layers. One reason for this, is that layers can be shared across Docker images. You may have noticed this on a pull or build — sometimes you will already have some of the layers in an image.
Each line in your Dockerfile creates a new layer, which essentially contains the differences between the previous layer and the result of that line in your Dockerfile.
You can see the layers and the command by using the docker history
command:
IMAGE CREATED CREATED BY SIZE
4a7fe1573638 6 hours ago /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "vim -- 0 B
73a7dbe54246 6 hours ago /bin/sh -c #(nop) ADD dir:596ca9f12dfec3fdbbd 11.88 kB
28d1f40d5840 6 hours ago /bin/sh -c #(nop) COPY file:ef55daa1a702bdc09 2.074 kB
885fbe2b3de3 6 hours ago /bin/sh -c mkdir -p /home/ian/.vim/autoload/a 0 B
35b67bc6e36f 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 54.5 kB
b4babda9ad00 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 154.2 kB
49c14700f301 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 163.4 kB
2c29ffbcfd06 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 1.089 MB
373276e4a965 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 394.1 kB
b8b9dd490be2 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 10.77 MB
279de2b82c31 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 5.919 MB
9176e8ae3045 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 74.62 kB
6dd983e0998f 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 985.3 kB
1b57a0c44405 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 40.7 kB
23758616068b 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 2.594 MB
fd26f883f7f6 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 146.7 kB
5f47150b1dd6 6 hours ago /bin/sh -c mkdir -p /home/ian/.vim/bundle 0 B
2e424782b553 6 hours ago /bin/sh -c #(nop) COPY file:8975359b99bffecb6 5.422 kB
516efbb2bda5 6 hours ago /bin/sh -c apk del alpine-sdk ncurses-dev pyt 568.4 kB
f08b4a3680df 6 hours ago /bin/sh -c cd /tmp/ && git clone https:// 176.5 MB
80529c2e20f4 6 hours ago /bin/sh -c apk --no-cache update && apk - 223.1 MB
665ffb03bfae 3 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0 B
<missing> 3 days ago /bin/sh -c #(nop) ADD file:90d7b7a4bad6a39f91 3.962 MB
The bottom line is Alpine Linux itself, with the next line up being the default
command to execute (running /bin/sh
). The next line up is where we pull all
the packages we need, which comes in about 220MB. We then build our vim
command, which adds another 176MB to the image. Now comes the interesting part.
When we execute our apk del
command to get rid of the packages we no longer
need, it doesn’t remove space — it actually adds another 570K to our image.
This is because it is adding more information to describe the difference
between a layer with our temporary packages and a layer without.
The layer with our temporary packages is still in our Docker image. We can remove another 220MB (over half the size of our Docker image) if we can somehow get rid of that layer.
Restructuring our Dockerfile
I said above that each line in our Dockerfile is responsible for a layer in our Docker image. This is the key to reducing the size of our Docker image. If we delete any temporary files in the same line as we create them, they won’t get written to a layer in the Docker image.
In this example, this is pretty easy to accomplish. Instead of having three
separate RUN
statements in our Dockerfile, we combine them into one (which
does the temporary package installation, the build and the clean up):
FROM alpine
RUN cd /tmp/ && \
apk --no-cache update && \
apk --no-cache add python git && \
apk --no-cache --virtual guzo-dependencies add alpine-sdk ncurses-dev python-dev && \
git clone https://github.com/vim/vim.git && \
cd vim && ./configure --prefix=/usr --enable-pythoninterp && \
make install && \
apk del alpine-sdk ncurses-dev python-dev guzo-dependencies && \
rm -rf /tmp/vim
COPY vimrc /home/ian/.vimrc
RUN mkdir -p /home/ian/.vim/bundle
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-pathogen.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-airline/vim-airline.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/kien/rainbow_parentheses.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/chrisbra/Colorizer.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-scripts/nginx.vim.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/vim-syntastic/syntastic.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/SirVer/ultisnips.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jeetsukumaran/vim-buffergator.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-fugitive.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/jelera/vim-javascript-syntax.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/groenewege/vim-less.git
RUN cd /home/ian/.vim/bundle && git clone https://github.com/tpope/vim-vinegar.git
RUN mkdir -p /home/ian/.vim/autoload/airline/themes/
COPY guzo-airline-theme.vim /home/ian/.vim/autoload/airline/themes/guzo.vim
ADD UltiSnips /home/ian/.vim/UltiSnips
CMD vim --help
Do we manage to save that 220MB with this technique?
example/vim latest 4ea4ea72953d 5 hours ago 113.8 MB
example/vim nocache 4a7fe1573638 6 hours ago 426.5 MB
example/vim alpine 3d8ac3152685 6 hours ago 426.5 MB
example/vim debian 1c600f302423 6 hours ago 664.8 MB
Actually, we saved over 300MB. It turns out that the layer where we build
vim
has reduced from the earlier 177MB to 87MB. We’ve also got rid of the
570K from the deletion of the temporary packages too.
We can see this from the docker history
command:
IMAGE CREATED CREATED BY SIZE
4ea4ea72953d 6 hours ago /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "vim -- 0 B
6c2a27f94878 6 hours ago /bin/sh -c #(nop) ADD dir:596ca9f12dfec3fdbbd 11.88 kB
bed0111fee59 6 hours ago /bin/sh -c #(nop) COPY file:ef55daa1a702bdc09 2.074 kB
9c50ecddeb27 6 hours ago /bin/sh -c mkdir -p /home/ian/.vim/autoload/a 0 B
666aae7b0a94 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 54.5 kB
1bbe461be3ba 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 154.2 kB
6e4257434128 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 163.4 kB
950822b4ee49 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 1.089 MB
e021d30a05a6 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 394.1 kB
ada4f046f647 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 10.77 MB
8eea52537543 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 5.919 MB
9a2ac03f2693 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 74.62 kB
6074a5564784 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 985.3 kB
bb2d16c05d9e 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 40.7 kB
676704bb1bde 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 2.594 MB
ff6c4c3557c2 6 hours ago /bin/sh -c cd /home/ian/.vim/bundle && git cl 146.7 kB
a6d43b112ed6 6 hours ago /bin/sh -c mkdir -p /home/ian/.vim/bundle 0 B
fc9d28b91184 6 hours ago /bin/sh -c #(nop) COPY file:8975359b99bffecb6 5.422 kB
3f90fb58dd26 6 hours ago /bin/sh -c cd /tmp/ && apk --no-cache upd 87.45 MB
665ffb03bfae 3 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0 B
<missing> 3 days ago /bin/sh -c #(nop) ADD file:90d7b7a4bad6a39f91 3.962 MB
Use Smaller Applications
It’s not relevant in this example, since we are actually building an application in our Docker image, but if we need an application in our Docker image (for example a web server) look for smaller alternatives as a way of saving space.
If you need a webserver, nginx
is a pretty standard option these days. The
Docker image for it comes in at 109MB:
nginx latest 958a7ae9e569 3 weeks ago 109.4 MB
However, if you have modest webserver needs (for example you might just be
serving up a static site), then smaller options are possible. For example,
I’ve used the httpd
server which is in busybox
. The whole busybox distro
comes in at just over 1MB:
busybox latest c75bebcdd211 5 weeks ago 1.106 MB
nginx latest 958a7ae9e569 3 weeks ago 109.4 MB
So, by using busybox we have a Docker image 1% the size of nginx. That means that if we are scaling our hosting and need to download a new Docker image, it is 100 times faster to download.
Summary
We have seen four techniques to reduce the size of our Docker images:
- Use a small distro as a base image
- Delete temporary resources in the same command as you create them
- Use optimal package management commands (such as not caching indexes, and identifying intermediate packages)
- Use smaller applications where possible
The outcome of this was:
- My
vim
image was reduced to 17% of its original size (665MB to 114MB) - A web server was reduced to 1% of its original size (109MB to 1MB)