Skip to content

Building Docker Containers

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

Revisiting The Basics

In my earlier post, Getting Started with Docker, I covered building a basic Dockerfile using the FROM, COPY, RUN, and CMD instructions and how to use a .dockerignore file to keep unnecessary files out of your images and containers. If you haven't read that post, go check it out to learn the basics of building Docker images. In this post, I'll cover some more advanced techniques for building container images. In addition, I recently published a post exploring advanced Docker CLI usage. I recommend giving it a read, too, if you aren't already a CLI pro.

Installing Dependencies

Using FROM with an official image for your language or framework will get you a long way, but many applications will require a system dependency that's not included in the FROM image. For example, many applications use ImageMagick for processing image uploads, but it's not included by default in the Debian images that most language images are based on. You can use RUN and apt-get to install missing dependencies.

FROM node:15

RUN apt-get update
RUN apt-get install -y imagemagick

WORKDIR /usr/src/app

COPY package*.json ./

RUN npm install

COPY . .

EXPOSE 3000

# Use the start script defined in package.json to start the application
CMD ["npm", "start"]

We started the Dockerfile just like the example from my earlier post, using the official NodeJS 15 image, but then we do 2 additional steps to install ImageMagick using apt-get. To keep the base image size low, Debian does not come pre-loaded with all of the data it needs to install packages from apt-get, so we need to run apt-get update first so that apt-get has that info. Then, we simply use apt-get install -y imagemagick to install imagemagick. The -y option is used to automatically respond with "yes" when apt-get prompts you to confirm the package installation.

RUN vs CMD (vs ENTRYPOINT)

By now you've probably noticed that there are two different instructions that run commands in your containers, RUN and CMD. While both are used to run commands, they're used in very different contexts. As we've seen in previous examples, RUN is used exclusively in the build process to run commands to modify the image as needed. CMD is different because it specifies the command that will be run by the container when you launch it using docker run. You can have as many RUN instructions as you need, but only one CMD. If you need to run a different command at runtime, you can pass it as an argument when you launch the container with docker run (check out my Docker CLI Deep Dive post). Additionally, Docker provides the ENTRYPOINT instruction. This is a command that the command you provide to the CMD instruction will be passed to as arguments. If you do not provide an ENTRYPOINT it will default to /bin/sh -c which will cause your CMD command to execute in a basic unix shell environment. The default ENTRYPOINT will satisfy most use cases. It's possible to override a container's CMD at runtime, but it is not possible to change its ENTRYPOINT. Docker's own ENTRYPOINT documentation goes into more detail about how it can be used.

In the example Dockerfile above, you probably noticed that the way commands are passed to CMD and RUN looks different. Typically, when using RUN you provide commands using shell syntax, and you provide commands to CMD (and ENTRYPOINT) using the exec syntax, but they can be used interchangably. When using shell syntax, you can resolve shell expressions within your command. You can use shell variables and operators like output pipes (|) and redirects (>, >>), as well as boolean operations (&&, ||) to join commands. Exec syntax is much more straightforward. Each string within the bracketed array is joined with the other elements with a space in between and run exactly as provided.

Layers and Caching

Each isntruction in your Dockerfile adds a new Layer to your image. For performance reasons, it's considered a best practice to limit the total number of layers that comprises your finished image. There are a number of ways to do this. The simplest is by combining lines where RUN or COPY are used in close proximity to each other. Consider the example above where we installed ImageMagick; instead of using two separate RUN instructions, we can combine them using the bash && operator.

FROM node:15

RUN apt-get update && apt-get install -y imagemagick

Combining copy commands is a bit easier. The COPY instruction takes any number of arguments. The first N parameters provided to COPY are interpreted as a list of files to copy, and the N+1th paramter is the location to copy those files to. You can also use * as a wildcard character as I did in the first example when copying the package.json and package-lock.json files to the image.

Anothing thing to consider when thinking about how your image layers are composed is caching. When Docker processes your Dockerfile to build your image, it runs each of the instructions in order to create the layers of your image. Docker analyzes each instruction before it is run and checks its cache to determine whether or not there is an identical existing image layer. When analyzing RUN instructions, Docker looks for any cached image layer that was built using the exact same command and uses it instead of rebuilding the same layer. For COPY and ADD instructions, it analyzes the files to be copied and looks for a previously built layer that has the exact same file contents. If at any point any instruction requires its layer to be rebuilt, all of the following instructions will result in a rebuild. Optimizing your Dockerfile to take advantage of the layer cache can greatly reduce the time it takes to build your image. Organize your Dockerfile so that the layers least likely to change are processed first (ex: installing dependencies) and those more likely to change (ex: copying application code) are processed later.

Conclusion

These techniqes will help you create more advanced container images and hopefully help you optimize them. However, I've only covered a small slice of the options available to you when building container images. If you dig deeper into the official Dockerfile reference you'll find information about all of the instructions available to you and more advanced concepts and use cases.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Prefer email? hi@thisdot.co