Six Gotchas with Running Docker Containers on Hadoop
Despite the potential value of containerizing workloads on Hadoop, Cloudera’s Daniel Templeton recommends waiting for Hadoop 3.0 before deploying Docker containers, citing security issues and other caveats.
“I thought of titling this, ‘It’s cool, but you can’t use it.’ There’s a lot of potential here, but until 3.0 — [it’s] not going to solve your problems,” he told those attending ApacheCon North America in Miami last week.
Templeton, who is a software engineer on the YARN development team at Cloudera, delved into the Docker support (download) provided by the Hadoop LinuxContainerExecutor as well as discussed when there might be better alternatives. He stipulated that he was talking about Docker on Hadoop, not Hadoop on Docker, which he called “an entirely different story.”
“I’ve got a Hadoop cluster. I want to execute my workloads in Docker containers,” he explained.
Hadoop’s YARN scheduler supports Docker as an execution engine for submitted applications, but there are a handful of things you should understand before you enter this brave new world of Docker on YARN, he said, explaining:
1. The application owner must exist in the Docker container
Currently, with Docker, when you run a container, you specify a user to run it as. If you specify the UID — not the username — and if the UID doesn’t exist, it will spontaneously create it for you. This remapping won’t work well with large numbers of images, where the user needs to be specified beforehand. Otherwise, you can’t access anything. You can’t access your launch script and you can’t write your logs; therefore it’s broken.
“There is no good way to deal with this. The discussion is YARN-4266. If you have a brilliant idea how to fix this, jump in on it,” he said. The approach taken by YARN-4266 “might not get exactly what you wanted, but it’s the least destructive thing we could think of doing. …This is one I don’t see resolving soon until Docker extends what they let you do,” he said.
2. Docker containers won’t be independent of the environment they run in
One of the chief benefits of Docker containers is their portability. Guess what? They won’t be very portable in Hadoop. If you want HDFS access, if you need to be able to deserialize your tokens, if you need a framework like MapReduce, if you’re doing Spark — you’ve got to have those binaries or those jars in your image. And versions have to line up.
There is a patch posted on this. The patch allows white-listed volume mounts, and you as an administrator can say, “These directories are allowed to be mounted into Docker containers.” And you can specify for those directories to be mounted when you submit your job. Problem solved as long as administrators pay attention to the fact that it could be running as root in the container, so don’t let them mount anything that could screw it up, he said.
3. Large images may cause failures
There is currently nothing in YARN to do with Docker image caching. When you execute your job, that docker_run will implicitly pull the image from the repo. Spark and MapReduce both have a 10-minute timeout. If you have an image in the network that takes more than 10 minutes to download, your job will fail. If you persistently resubmit, it eventually will land on a node that you’ve already tried it on, and it will run. But that’s not the greatest solution.
YARN 3854 is a first step, not a solution. It lets YARN localize images in the same way it localizes data. In YARN, you can say, “I’m submitting this application, and this is the data, the ancillary libraries — the whatever the heck it — that this job is going to need. Please distribute it to all the nodes where my job will run.” And YARN will do that. The problem is that will not save you from the 10-minute timeout. So there’s more work to do there.
4. There is no real support for secure repos.
Docker stores its credentials for accessing a secure repo in a client_config, which is always set to your .docker/config.json. You have no way from YARN to change that. That means when you’re accessing a secure repo, you’re subject to the .docker/config.json file in your user’s home directory on whatever node manager you land on. That’s probably not what you want. There is a JIRA for that, however, 5428, which will make it configurable.
5. There is only basic support for networks
“When you’re thinking Docker on YARN, you start thinking about Kubernetes, Mesos, that type of thing. Kubernetes gives you this really nice facility for doing network management, right? You submit jobs and you say, ‘This is part of the network and that’s part of the network.’ And networks magically materialize and CNS routing is handled, and the world’s a wonderful place with puppies and unicorns,” he said.
YARN does not offer you that. It does not offer the notion of pods where you can say, “These applications are all part of the same pod. Go run them together and share the network.” There’s no notion of port mapping built in. There’s no real automated management over the network. Instead, you can explicitly create networks in Docker on all your node manager machines, then you can request those networks. But that’s it.
6. There are massive security implications
Some people are paranoid about this, though he says he’s not: You can execute privileged containers. A privileged container in Docker gets to peek into the underlying operating system, access to things like slash-proc and devices. You can turn that off or limit it to a certain set of users, so it is controlled, but you have to be aware of it.
The other side of the coin is you can only do terrible things to the underlying OS if you’re running as root in the container. At this point, YARN provides you no way to specify your user. In the future it likely will. “There are security implications with Docker on Hadoop that you really have to think through.”
While some Docker fixes are in Hadoop 2.8, they’re not enough to be useful, according to Templeton. Among the 3.0 features not in 2.8:
- Mounts localized file directories as volumes
- cgroups support
- Support for different networking options
The Hadoop 3.0 release is scheduled by the end of the year, according to release manager Andrew Wang, also a software engineer at Cloudera. It’s undergoing two alphas, and a third alpha is planned before it goes to beta.
Its major feature will be improved Hbase erasure coding, which will provide users with 1.5 times the storage, meaning they can save half the cost of hard disks. This reworking of storage will have a massive impact on users of YARN and MapReduce, Wang said in a separate interview.
The project has been working with major users including Yahoo, Twitter and Microsoft to ensure compatibility with existing systems and enable rolling upgrades without pain, Wang said.