Docker Basics: Diving into the Essential Concepts, Tools, and Terminology
There are many “get up and running with Docker” resources out there, nearly all of which seem to assume you’re already familiar with the technology. Docker is powerful indeed, but to harness it effectively it helps to have a solid understanding of the platform’s underlying concepts and terminology before delving into the download.
Getting grounded in these foundational Docker terms will help you navigate the Docker-verse, and make it easier for you to ask the right questions if you do end up lost.
I love the Docker logo because it neatly encapsulates the container philosophy in one cute graphic: a whale bearing a pile of shipping containers. (Fun fact: the whale’s name is “Moby Dock”). Virtual containers are very much like those metal shipping cubes that carry products from overseas factory to cargo ship to port to delivery truck, before arriving at your local store.
What’s inside the container is of no concern to the delivery system; the uniform exterior of the container means distribution is easily standardized each step of the way.
It’s the same with Docker containers containing apps. Docker containers are self-sufficient, requiring only that they run on a sufficiently powerful operating system of either the Linux or Windows variety. Any machine running the container has little need to engage with/care about what’s inside. And the Dockerized app does not care if it’s on a Kubernetes cluster or a lone server — it will be able to run on any environment, anywhere, so long as the Docker platform is installed.
Container architecture means that the application process running inside a container thinks it’s the only one; looking outward, it sees only a barebones Linux distro. All necessary dependencies are neatly packed within. Multiple containerized apps on a single server are independent, i.e., they don’t interfere with each other — which means you can update a particular process without worrying any other ones will break.
Although some users limit each container to running a single process at a time, containers can actually run multiple processes. You could package many services into a single container (using a proxy server like Nginx, or Gunicorn if you’re a Python person) and have them all run side by side. That utility is nice and ultimately depends on the practitioner and the project.
Pure container architecture probably argues for single process/single box, but there are many real-world success stories using multiservice containers.
So: containers are lightweight and portable encapsulations of an environment in which to run applications. To conjure up a container, you use a Docker image. An image is like a blueprint, a basis for creating — just one, or as many as you like — brand-new containers.
The two are closely related and often confused, but this is an essential distinction to internalize when first learning Docker-fu.
On its own, an image is an inert and immutable file, meaning that images do not do anything, and cannot be changed. However, you can start a container from an image, perform operations in it, and then save a brand new image based on the latest state of the container (so as to create more containers exactly like that one). Images are created with the build command, and they’ll produce a container when started with run.
Think of it this way: an image is like a powered-down computer. Starting up a container is like creating a brand new exact copy of that computer, software and all, only one that is running. The original computer (the image) remains on your desk, still powered down, while the new one (the container) hums away busily with its assigned tasks. Or, in other words, if an image is running, then it is a container. Probably.
Now to confuse you again: To turn an image into a container, the Docker engine takes the image, adds a read-write filesystem on top and initializes settings (container name, ID, network ports, etc.). A running container has a currently executing process, but a container can also be stopped (or, in Docker terminology, exited). An exited container is not running, but that does not mean it has become an image: it can be restarted and will retain all settings and any filesystem changes. Remember, though, that you can save any given container as an image.
Clear as mud, right? I finally got a grip on this dichotomy when I thought of it in programming terms: if an image is a class, then a container is an instance of a class — a runtime object.
Containers are designed to leave nothing behind — as soon as a Docker container is removed, any changes you made to its contents are lost. (Unless of course, you save it as an image first). But what about having data persist? That’s useful, right?
This is where volumes come in. When starting up a Docker container you can specify directories as mount points for volumes, which are repositories for shared or persistent data that remain even if a container gets removed. The beautiful thing here is that you don’t need to know anything about the host: you designate a volume, and Docker makes sure it’s saved somewhere on, and retrievable from, the host system. When a container is exited, any volumes it was using persist — so if you start a second container it can use all the data from the previous one.
A Dockerfile is a simple text file that contains a list of commands the Docker client calls (on the command line) when assembling an image. This essentially automates the image creation process because these special files are, basically, scripts — a set list of commands/instructions and arguments that automatically perform actions on a chosen base image. Dockerfiles are essentially the build instructions for a new project, written in executable code. A Dockerfile is similar in concept to the recipes and manifests found in infrastructure automation (IA) tools like Chef.
So a Dockerfile installs the operating system, all relevant components, and then makes sure all necessary dependencies are in place. Dockerfiles begin with defining an image (from) as the build process starting point. You can then execute commands — for example, start a new container from the image snapshot of the previous step, then execute it and save the result — copying local files — as the new most-recent image. Usually, you also specify a default command to run (entrypoint) and the default arguments (cmd) when starting a container from this image.
The registry is the central distribution point for deploying Docker containers. You can orchestrate distribution directly from your Docker host, or use an agent like Kubernetes or Docker Swarm to gain features such as automated deployment and scaling.
The registry works basically like a git repository, allowing you to push and pull container images. For example, a project’s DevOps person would likely be in charge of creating the Dockerfile, building the relevant container image, and then pushing it to the registry. Any other developer working on the project can now simply pull the latest version from the registry and use it.
Toolbelt: Loaded and Ready
After reading this far, you hopefully have a good idea of Docker’s foundational concepts and terminology. Navigating the Docker-verse can be challenging at first, but knowing the names of things and how they fit together helps you to ask the right questions when you get stuck, or completely lost.
Knowledge is power! Next step: diving into your first Docker tutorial. Your coding life is about to get harder, at least at first (learning curves suck) but ultimately it will end up much easier. Really.