Docker Image Layers Are Like Git Commits
One way to think about Docker image layers is to think of them as git commits. While the two are technically different, this article uses this analogy to point out an interesting commonality between both of them.
Whenever you add a new commit to a git repository, the repository always gets larger. Even if you are removing a file with a commit, the git repository will get larger. The git repository contains all the historical changes ever made to the repo. Removing files just adds to that history.
For example, let’s say that you accidentally committed a large sql dump of your database. That database dump file consequently bloats your git repo size. In the very next commit, you remove the database dump. But since the database dump is still in the git history, anyone who initially clones the repo downloads the database dump commit also. The database dump does not disappear from git history solely because you removed it with an additional commit.
Docker image layers work similarly compared to git commits. Any new Docker image layer created with an instruction in the Dockerfile increases the size of the Docker image. Even if that instruction removes files from the filesystem - just like a git commit that removes a database dump file. Docker layering technology keeps a historical record all the Docker layers that ever existed.
You often see Dockerfile combining concise instructions to a huge one-liner for this very reason. Dockerfiles without size optimization that look like this:
RUN apt-get update
RUN apt-get install -y curl python-pip
RUN pip install requests
RUN apt-get remove -y python-pip curl
RUN rm -rf /var/lib/apt/lists/*
Get turned into Dockerfiles that look like this:
RUN apt-get update && \
apt-get install -y curl python-pip && \
pip install requests && \
apt-get remove -y python-pip curl && \
rm -rf /var/lib/apt/lists/*
The latter Dockerfile removes the /var/lib/apt/lists/*
before the files get a chance to get committed into a Docker layer forever.
When I mentally analogize Docker layers to git commits, this quickly made Docker layering clear for me. I hope this analogy helps others.
Thanks for reading this far. If you found this article useful, I'd really appreciate it if you share this article so others can find it too! Thanks 😁 Also follow me on Twitter.
Got questions? Check out BoltOps.
You might also like
More tools:
-
Kubes
Kubes: Kubernetes Deployment Tool
Kubes is a Kubernetes Deployment Tool. It builds the docker image, creates the Kubernetes YAML, and runs kubectl apply. It automates the deployment process and saves you precious finger-typing energy.
-
Jets
Jets: The Ruby Serverless Framework
Ruby on Jets allows you to create and deploy serverless services with ease, and to seamlessly glue AWS services together with the most beautiful dynamic language: Ruby. It includes everything you need to build an API and deploy it to AWS Lambda. Jets leverages the power of Ruby to make serverless joyful for everyone.
-
Lono
Lono: The CloudFormation Framework
Building infrastructure-as-code is challenging. Lono makes it much easier and fun. It includes everything you need to manage and deploy infrastructure-as-code.