The Dockerfile is the starting point for creating a Docker image. The file format provides a well-defined set of directives that allow you to copy files or folders, run commands, set environment variables, and do other tasks required to create a container image. It’s really important to craft your Dockerfile well to keep the resulting image secure, small, quick to build, and quick to update.
In this post, we’ll see how to write good Dockerfiles to speed up your development flow, ensure build reproducibility and that produce images that can be confidently deployed to production.
Note: for this blog post we’ll base our Dockerfile examples on the react-java-mysql sample from the awesome-compose repository.
Development flow
As developers, we want to match our development environment to the target production context as closely as possible to ensure that what we build will work when deployed.
We also want to be able to develop quickly which means we want builds to be fast and for us to be able to use developer tools like debuggers. Containers are a great way to codify our development environment but we need to define our Dockerfile correctly to be able to interact quickly with our containers.
Incremental builds
A Dockerfile is a list of instructions for building your container image. While the Docker builder caches the result of each step as an image layer, the cache can be invalidated causing the step that invalidated the cache and all subsequent steps to need to be rerun and the corresponding layers to be regenerated.
The cache is invalidated when files in the build context that are referenced by COPY or ADD change. The ordering of the steps can therefore have drastic effects on performance.
Let’s take a look at an example where we build a NodeJs project in the Dockerfile. In this project, there are dependencies specified in the package.json file which are fetched when the npm ci command is run.
The simplest Dockerfile would be:
FROM node:lts
ENV CI=true
ENV PORT=3000
WORKDIR /code
COPY . /code
RUN npm ci
CMD [ "npm", "start" ]
Structuring the Dockerfile as above will cause the cache to be invalidated at the COPY line any time a file in the build context changes. This means that the dependencies will be fetched and the node_modules directory filled when any file is changed instead of just the package.json file which can take a long time.
To avoid this and only fetch the dependencies when they change (i.e.: when package.json or package-lock.json changes), we should consider separating the dependency installation from the build and run of our application.
A more optimized Dockerfile would be this:
FROM node:lts
ENV CI=true
ENV PORT=3000
WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src
CMD [ "npm", "start" ]
Using this separation, if there are no changes in package.json or package-lock.json then the cache will be used for the layer generated by the RUN npm ci instruction. This means that when you edit your application source and rebuild, the dependencies won’t be redownloaded which saves time 🎉.
We also limit the second COPY
to the src
directory as explained in a previous post.
Keep live reload active between the host and the container
This tip is not directly related to the Dockerfile but we often hear this kind of question: How do I keep live reload active while running the app in a container and modifying the source code from my IDE on the host machine?
With our example, we need to mount our project directory in the container and pass an environment variable to enable Chokidar which wraps NodeJS file change events from the host.
$ docker run -e CHOKIDAR_USEPOLLING=true -v ${PWD}/src/:/code/src/ -p 3000:3000 repository/image_name
Consistent builds
One of the most important things with a Dockerfile is to build the exact same image from the same build context (sources, dependencies…)
We’ll continue to improve the Dockerfile defined in the previous section.
Build consistently from sources
As we saw in the previous section, we’re able to build an application by adding the source files and dependencies in the Dockerfile description and then running commands on them.
But in our previous example we aren’t able to confirm that the image generated will be the same each time we run a docker build…Why? Because each time NodeJS is released, we can expect the lts tag to point to the latest LTS version of the NodeJS image, which will change over time and could introduce breaking changes. We can easily fix this by using a more specific tag for the base image (we’ll let you choose between LTS or the latest stable version 😉)
FROM node:13.12.0
ENV CI=true
ENV PORT=3000
WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src
CMD [ "npm", "start" ]
We’ll see in the No more latest section that there are other advantages to using more specific base image tags and avoiding the latest tag.
Multi-stage and targets to match the right environment
We made the development build consistent, but how can we do this for the production artifact?
Since Docker 17.05, we can use multi-stage builds to define steps to produce our final image. Using this mechanism in our Dockerfile, we’ll be able to split the image we use for our development flow from that used to build the application and that used in production.
FROM node:13.12.0 AS development
ENV CI=true
ENV PORT=3000
WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src
CMD [ "npm", "start" ]
FROM development AS builder
RUN npm run build
FROM nginx:1.17.9 AS production
COPY --from=builder /code/build /usr/share/nginx/html
Each time you see FROM
… AS
… it’s a build stage.
So we now have a development, a build, and a production stage.
We can continue to use a container for our development flow by building the specific development stage image using the --target
flag.
$ docker build --target development -t repository/image_name:development .
And use it as usual
$ docker run -e CHOKIDAR_USEPOLLING=true -v ${PWD}/src/:/code/src/ repository/image_name:development
A docker build without the --target
flag will build the final stage which in this case is the production image. Our production image is simply a nginx image with the binaries built in the previous steps put in the correct place that they are served.
Production ready
It’s really important to keep your production image as lean and as secure as possible. Here are a few things to check before running a container in production.
No more latest image version
As we previously saw in the Build consistently from sources section, using a specific tag for build steps help to make the image build reproducible. There are at least two other very good reasons to use more specific tags for your images:
- You can easily find all the containers running with an image version in your favorite orchestrator (Swarm, Kubernetes…)
# Search in Docker engine containers using our repository/image_name:development image
$ docker inspect $(docker ps -q) | jq -c '.[] | select(.Config.Image == "repository/image_name:development") |"\(.Id) \(.State) \(.Config)"'
"89bf376620b0da039715988fba42e78d42c239446d8cfd79e4fbc9fbcc4fd897 {\"Status\":\"running\",\"Running\":true,\"Paused\":false,\"Restarting\":false,\"OOMKilled\":false,\"Dead\":false,\"Pid\":25463,\"ExitCode\":0,\"Error\":\"\",\"StartedAt\":\"2020-04-20T09:38:31.600777983Z\",\"FinishedAt\":\"0001-01-01T00:00:00Z\"}
{\"Hostname\":\"89bf376620b0\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":true,\"AttachStderr\":true,\"ExposedPorts\":{\"3000/tcp\":{}},\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[\"CHOKIDAR_USEPOLLING=true\",\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\",\"NODE_VERSION=12.16.2\",\"YARN_VERSION=1.22.4\",\"CI=true\",\"PORT=3000\"],\"Cmd\":[\"npm\",\"start\"],\"Image\":\"repository/image_name:development\",\"Volumes\":null,\"WorkingDir\":\"/code\",\"Entrypoint\":[\"docker-entrypoint.sh\"],\"OnBuild\":null,\"Labels\":{}}"
#Search in k8s pods running a container with our repository/image_name:development image (using jq cli)
$ kubectl get pods --all-namespaces -o json | jq -c '.items[] | select(.spec.containers[].image == "repository/image_name:development")| .metadata'
{"creationTimestamp":"2020-04-10T09:41:55Z","generateName":"image_name-78f95d4f8c-","labels":{"com.docker.default-service-type":"","com.docker.deploy-namespace":"docker","com.docker.fry":"image_name","com.docker.image-tag":"development","pod-template-hash":"78f95d4f8c"},"name":"image_name-78f95d4f8c-gmlrz","namespace":"docker","ownerReferences":[{"apiVersion":"apps/v1","blockOwnerDeletion":true,"controller":true,"kind":"ReplicaSet","name":"image_name-78f95d4f8c","uid":"5ad21a59-e691-4873-a6f0-8dc51563de8d"}],"resourceVersion":"532","selfLink":"/api/v1/namespaces/docker/pods/image_name-78f95d4f8c-gmlrz","uid":"5c70f340-05f1-418f-9a05-84d0abe7009d"}
- In case of CVE (Common Vulnerabilities and Exposure), you can quickly know if you need to patch or not your containers and image descriptions.
From our example we could specify that our development and production images are alpine versions.
FROM node:13.12.0-alpine AS development
ENV CI=true
ENV PORT=3000
WORKDIR /code
COPY package.json package-lock.json /code/
RUN npm ci
COPY src /code/src
CMD [ "npm", "start" ]
FROM development AS builder
RUN npm run build
FROM nginx:1.17.9-alpine
COPY --from=builder /code/build /usr/share/nginx/html
Use official images
You can use Docker Hub to search for base images to use in your Dockerfile, some of these are the officially supported ones. We strongly recommend to use these images as:
- their content has been verified
- they’re updated quickly when a CVE is fixed
You can add an image_filter request query param to only get the official images.
https://hub.docker.com/search?q=nginx&type=image&image_filter=official
All the previous examples in this post were using official images of NodeJS and NGINX.
Just enough permissions!
All applications, running in a container or not, should adhere to the principle of least privilege which means an application should only access the resources it needs.
In case of malicious behavior or because of bugs, a process running with too many privileges may have unexpected consequences on the whole system at runtime.
Because the NodeJS official image is well setup, we’ll switch to the backend Dockerfile.
Configuring an image to run as an unprivileged user is very easy:
FROM maven:3.6.3-jdk-11 AS builder
WORKDIR /workdir/server
COPY pom.xml /workdir/server/pom.xml
RUN mvn dependency:go-offline
RUN mvn package
FROM openjdk:11-jre-slim
RUN addgroup -S java && adduser -S javauser -G java
USER javauser
EXPOSE 8080
COPY --from=builder /workdir/server/target/project-0.0.1-SNAPSHOT.jar /project-0.0.1-SNAPSHOT.jar
CMD ["java", "-Djava.security.egd=file:/dev/./urandom", "-jar", "/project-0.0.1-SNAPSHOT.jar"]
Simply by creating a new group, adding a user to it, and using the USER directive we can run our container with a non-root user.
Conclusion
In this blog post we just showed some of the many ways to optimize and secure your Docker images by carefully crafting your Dockerfile. If you’d like to go further you can take a look at:
- Our official documentation about Dockerfile best practices
- A previous post on the subject by Tibor Vass
- A session during the DockerCon 2019 by Tibor Vass and Sebastiaan van Stijn
- Another session during Devoxx 2019 by Jérémie Drouet and myself
Feedback
0 thoughts on "Speed Up Your Development Flow With These Dockerfile Best Practices"