One way to think about Docker image layers is to think of them as git commits. While the two are technically different, this article uses this analogy to point out an interesting commonality between both of them.

Whenever you add a new commit to a git repository, the repository always gets larger. Even if you are removing a file with a commit, the git repository will get larger. The git repository contains all the historical changes ever made to the repo. Removing files just adds to that history.

For example, let’s say that you accidentally committed a large sql dump of your database. That database dump file consequently bloats your git repo size. In the very next commit, you remove the database dump. But since the database dump is still in the git history, anyone who initially clones the repo downloads the database dump commit also. The database dump does not disappear from git history solely because you removed it with an additional commit.

Docker image layers work similarly compared to git commits. Any new Docker image layer created with an instruction in the Dockerfile increases the size of the Docker image. Even if that instruction removes files from the filesystem - just like a git commit that removes a database dump file. Docker layering technology keeps a historical record all the Docker layers that ever existed.

You often see Dockerfile combining concise instructions to a huge one-liner for this very reason. Dockerfiles without size optimization that look like this:

RUN apt-get update
RUN apt-get install -y curl python-pip
RUN pip install requests
RUN apt-get remove -y python-pip curl
RUN rm -rf /var/lib/apt/lists/*

Get turned into Dockerfiles that look like this:

RUN apt-get update && \
    apt-get install -y curl python-pip && \
    pip install requests && \
    apt-get remove -y python-pip curl && \
    rm -rf /var/lib/apt/lists/*

The latter Dockerfile removes the /var/lib/apt/lists/* before the files get a chance to get committed into a Docker layer forever.

When I mentally analogize Docker layers to git commits, this quickly made Docker layering clear for me. I hope this analogy helps others.