Containers

Moving Ruby Apps From Heroku to Docker

8 Aug 2014 8:37am, by

I’ve been experimenting with migrating applications currently hosted on Heroku to Docker. I’ve hit a few snags as I’ve been experimenting and wanted to share my experiences and how I overcame a few gotchas. Running a single process container (like a Rails app using Sqlite) is simple. But, when you start running apps which use Heroku services (even the Postgres database is a service to your app), things get complicated because you need to link containers, and there is sadly little information on the Internet (the Docker docs are very generic) about how to do this. Specifically, there is very little step-by-step information about how to link an app in a real application framework with a real database service. Understanding how these components link together and (troubleshooting them when they don’t) is something I wish I had known before playing with Docker and Rails. This guide will show you how. If you are busy, you can skip to the TL;DR section at the very end of this post to see how I took an app which runs on Heroku and converted it to running inside Docker as a multiple process set of containers. This section gives you a quick how to when enabling your rails app for Docker. If you need more details on a specific step, just jump back to the section where that step is explained in more detail.

Using the Heroku Rails reference app

Start with the sample application from Heroku (https://github.com/heroku/ruby-rails-sample)

$ git clone https://github.com/heroku/ruby-rails-sample
$ cd ruby-rails-sample

Now, let’s make sure we can run this locally. I use RVM locally to manage different versions of Ruby, I’ve already got Postgres installed on my laptop (using Homebrew) and I love the simplicity of the zero configuration web server Pow. Once I have these three things installed I can just run these commands and things just work.

$ powder link
$ printf 'if [ -f "$rvm_path/scripts/rvm" ] && [ -f ".ruby-version" ]; then 
   source "$rvm_path/scripts/rvm" 
   rvm use `cat .ruby-version`
fi' > .powrc
$ printf “2.1.1” > .ruby-version
$ bundle  
$ bundle exec rake db:create db:schema:load 
$ powder open

A little more detail about these steps:

  1. We just linked our application inside pow
  2. Setup a .powrc which loads our .ruby-version file to properly use the correct version of ruby when running inside of pow.
  3. Setup the .ruby-version file to point to 2.1.1 (which we ascertained from peeking into the Gemfile)
  4. Installed the relevant ruby libraries
  5. Created the database and installed our application schema into the database.
  6. Then, we opened the application inside our browser.

We should now see this in our browser. Our Rails app running locally Our app does not do much, just displays the server time when the page is loaded. In fact, if we inspect the app we don’t even have models inside the “app/models” directory, so we should verify that we are actually connected to the database. Running the ps command and searching for postgres shows that our app is connected properly to the database:

$ ps ax | grep postgres | grep xrdawson 
38270   ??  Ss     0:00.01 postgres: xrdawson ruby-rails-sample_development 127.0.0.1(54935) idle 
38273   ??  Ss     0:00.01 postgres: xrdawson ruby-rails-sample_development 127.0.0.1(54938) idle

If we look inside our config/database.yml file, we can see there is a database named “ruby-rails-sample_development” specified. We have not actually created this database, but since the app does not actually run queries against it, it does not look like Rails cares that it does not exist.

Dockerize our Sample Rails App

Now, to dockerize this, we’ll use a Dockerfile, which will containerize our application. There is a “stock” image (also called a “base” image, an image sanctioned by Docker or another well known organization) for rails. We can use this one as our base and add the relevant extra steps to this file to make a reproducible image manifest (the “Dockerfile”) which we can conveniently store alongside the rest of our source code.

$ printf “FROM rails” > Dockerfile
$ docker build -t sample_rails_app_for_heroku .

This fails! We see these issues:

Sending build context to Docker daemon 1.802 MB
Sending build context to Docker daemon
Step 0 : FROM rails
# Executing 5 build triggers
Step onbuild-0 : ADD . /usr/src/app
---> 1379eabf72b1
Step onbuild-1 : WORKDIR /usr/src/app
---> Running in 0d4b99e50593
---> f2f211cf84ea
Step onbuild-2 : RUN bundle install --system
---> Running in 1b94dbcb836d
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and
installing your bundle as root will break this application for all non-root
users on this machine.
Your Ruby version is 2.1.2, but your Gemfile specified 2.1.1
2014/08/06 10:30:33 The command \
[/bin/sh -c bundle install --system] returned a non-zero code: 18

So, the base rails image uses Ruby 2.1.2 and our Gemfile specifies 2.1.1. What to do? Does it matter if we upgrade? Hard to know. There could be libraries which don’t work or don’t compile. Let’s just upgrade and see what happens.

$  rvm install 2.1.2 --verify-downloads 1 # without verify-downloads this failed…

Now we have ruby 2.1.2 installed. Let’s change the line in our Gemfile and .ruby-version to switch to 2.1.2. We’ll verify it works locally, and then retry with Docker.

$ ruby -pi -e 'gsub(/2\.1\.1/, "2.1.2")' .ruby-version Gemfile
$ git commit -am “Switch to 2.1.2”
$ cd . # this switches us to 2.1.2 inside our shell
$ bundle # install the libraries inside ruby 2.1.2 context.
$ powder open

When we do this, we still see it running locally. So, re-run the docker build command.

$ docker build -t sample_rails_app_for_heroku .

It builds and regenerates the cache of gems from the Gemfile. Now, we can run the container.

$ docker run -p 3000:3000 sample_rails_app_for_heroku

Then, we can visit it running on our docker host. I’m on OSX and my internal docker IP is 192.168.59.103.

 

If I hit: http://192.168.59.103:3000 I see this: Cannot connect to Postgres

So, the application is running inside the container, but it cannot connect to postgres. The base image we used (“rails”) does not have postgres running inside it. We could add postgres to this container (add “RUN apt-get install postgresql -y” to the Dockerfile for example), but there is debate about whether this is the right way to do things. Isolating to one process per container means that we could at some point switch to a cluster of postgres containers and our rails server would not have to know the difference (as opposed to bloating our rails image with a bunch of additional code and data), so let’s run another container with postgres inside it and connect them together using Docker “links.”

$ docker run --name sample_rails_postgres -p 5432:5432 postgres

This command will start the base image “postgres” (installing it first if you have not yet retrieved it), and then start on port 5432. Now, we should be able to link the containers using the “link” switch when running our rails app, and they can talk to each other in a secure way. You should do this in another terminal window because this command will run and display the output of the running container; if you want to background this image, use the -d switch. Now, start the rails container, linking it to the postgres container using the –link switch.

$ docker run  -p 3000:3000 \
--link sample_rails_postgres:sample_rails_postgres \
sample_rails_app_for_heroku

We immediately see an issue. Our app is not properly configured to reach the postgres linked server, so the error message is the same. What to do? And, how to do it keeping it working on Heroku and running locally?

Troubleshooting using nsenter

When things don’t work, we could of course do a Google search. But, since our rails app is running inside a new context, namely a Docker container, our results are limited right now. A better way to troubleshoot is to inspect our container itself. But, we can’t get SSH into our container: for the same reasons that we don’t want to run postgres inside the same container as our Rails app, it is consensus among many Docker users that we should not use an SSH server inside our container, and I generally agree. You could install SSH temporarily, but then we are cluttering the image and our build steps will get complicated when we back out the change to ready for deployment. Instead we can use nsenter from one of the Docker employees to shell into a running container. This is the right way to debug running containers as you keep your experimental commands out of the workflow for building the final image. Unfortunately, using the installation instructions for nsenter did not work (it never built the binary into my OSX /usr/local/bin directory) but putting this snippet of code (explained further down in the README) did work.

docker-enter() {
boot2docker-cli ssh '[ -f /var/lib/boot2docker/nsenter ] || \
docker run --rm -v /var/lib/boot2docker/:/target jpetazzo/nsenter'
boot2docker-cli ssh -t sudo /var/lib/boot2docker/docker-enter "$@"
}

I added these lines to my .bash_profile (and then running “. ~/.bash_profile” to “source” it) and then could use the “docker-enter” command. Running docker-enter looks like this:

$ docker ps
CONTAINER ID        IMAGE                                COMMAND                CREATED             STATUS              PORTS                    NAMES
ff806bee1067        sample_rails_app_for_heroku:latest   rails server           7 days ago          Up 37 minutes       0.0.0.0:3000->3000/tcp   kickass_kowalevski
2dd659027836        postgres:latest                      /usr/src/postgres/do   7 days ago          Up 37 minutes       5432/tcp                 kickass_kowalevski/sample_rails_postgres,sample_rails_postgres
$ docker-enter 2dd659027836
root@2dd659027836:~# apt-get update
...
root@2dd659027836:~# apt-get install postsgreql -y
...
root@2dd659027836:~# su postgres -l
$ psql -h localhost
psql (9.3.4)
Type "help" for help.

postgres=# \list
List of databases
Name    |  Owner   | Encoding  | Collate | Ctype |   Access privileges
-----------+----------+-----------+---------+-------+-----------------------
postgres  | postgres | SQL_ASCII | C       | C     |
template0 | postgres | SQL_ASCII | C       | C     | =c/postgres          +
|          |           |         |       | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C       | C     | =c/postgres          +
|          |           |         |       | postgres=CTc/postgres
(3 rows)

So, I can get a shell to the database server, and see the databases inside the running databases. So, we know things are working with the Postgresql server. I did not suspect this was the issue anyway, the problem is in the connectivity from the rails container. But it is nice to see that we can make sure everything is working as we expect on the Postgres server. If you have the Postgres command line tools installed on your host machine, you could use those to test against the server here. Make sure to specify the IP address of the docker host (a command that might look like “psql -h 192.168.59.103”). If we jump into the Rails app container using docker-enter (“docker-enter ff806bee1067”), and then install postgresql tools as we did above, we can try to get to the server and diagnose what’s happening. Reading the documentation for link, we should be able to see an environment variable which tells us the IP of our Postgres server and how to reach it. Let’s figure this out:

root@ff806bee1067:~# set | grep -i sample
root@ff806bee1067:~#

Nothing! Thanks to cpuguy83 on IRC who told me: “nsenter runs in a separate process. /proc/1/env is what you want.” Docker adds the environment variables for the link into just the process, not the entire container. So, you need to use the proc filesystem to inspect that specific process and see its environment, which nicely happens to be the first process in the container.

root@ff806bee1067:~# cat /proc/1/environ
HOME=/PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binHOSTNAME=7902bd7d540eSAMPLE_RAILS_POSTGRES_PORT=tcp://172.17.0.152:5432SAMPLE_RAILS_POSTGRES_PORT_5432_TCP=tcp://172.17.0.152:5432SAMPLE_RAILS_POSTGRES_PORT_5432_TCP_ADDR=172.17.0.152SAMPLE_RAILS_POSTGRES_PORT_5432_TCP_PORT=5432SAMPLE_RAILS_POSTGRES_PORT_5432_TCP_PROTO=tcpSAMPLE_RAILS_POSTGRES_NAME=/jovial_morse/sample_rails_posgresSAMPLE_RAILS_POSTGRES_ENV_PGDATA=/var/lib/postgresql/dataroot@7902bd7d540e:~#

So, we can see it in that jumbled mess. As the docs explain, we’ll get our environment variables prefixed with the linked alias converted to uppercase. In this case we can see the environment variable we want is this one: SAMPLE_RAILS_POSTGRES_PORT_5432_TCP_ADDR. Let’s add this to our database.yml file.

$ ruby -pi -e \
'gsub(/[^\#]host: localhost/, "host: <%= ENV['SAMPLE_RAILS_POSTGRES_PORT_5432_TCP_ADDR'] || 'localhost' %>" )' config/database.yml

(This command uses inline ruby edit mode with a gsub command to change the line “host: localhost” to a line which pulls the environment variable, and ignores the commented out line below that looks like ‘#host…”) Essentially what we are doing is using the environment variable when it exists, but the original localhost host when the variable does not. So, things will still work in our local environment. Now, if we run again, we should see it work right? Nope! We will still get our connection failed issue. Our image still references the same source files as before (so the image did not have our source file changes to config/database.yml), so we need to rebuild. So, go take another nap. Seriously, you don’t have time to take a nap, but this does require a wait time which is frustrating, especially when you are figuring things out.

$ docker build -t sample_rails_app_for_heroku .
Sending build context to Docker daemon 2.481 MB
…
Successfully built 22fcd3739b27
$ docker run  -p 3000:3000 \
--link sample_rails_postgres:sample_rails_posgres \
sample_rails_app_for_heroku

Specifying the user for the container

Now, we see something different. User root does not exist This is a step forward: we are running as root inside our container and postgres tells us there is not a root user specified in our database. We can see from the previous output of our “\list” command (using the psql client from inside the postgres container) that there is a database named postgres owned by the postgres user. Your local postgres installation could have a different username, etc. so you’ll want to adjust the config/database.yml file to reflect this. Something like this:

username: <%= ENV['LOGNAME'].eql?( "xrdawson" ) ? "xrdawson" : "postgres" %>

This checks to see if we are running locally and if so uses our local postgresql user (“xrdawson”, my username on my OSX machine) and otherwise uses the “postgres” user which is what we will use when running inside docker containers. Heroku will ignore all of this, as it generates its own config/database.yml, so this will all still run within Heroku even with our changes here. Our final config/database.yml file might look like this (just the development section):

development:
  adapter: postgresql
  database:  postgres
  host: <%= ENV['SAMPLE_RAILS_POSTGRES_PORT_5432_TCP_ADDR'] || 'localhost' %>
  username: <%= ENV['LOGNAME'].eql?( "xrdawson" ) ? "xrdawson" : "postgres" %>

Note that we changed the database to “postgres” from the original “ruby-rails-sample_development” name. Our local server does not care that what the name is, but our dockerized server does, so let’s use the one which works for Docker. We have the host dynamically determined if running within Docker, and we specify the username based on whether we are running locally or inside Docker. We now need to make sure to run the rails command “rake db:schema:load” to load our schema into the database right after we instantiate the container which is linked to Postgres. We can add this to our Dockerfile so that after our container is built we run this command. Be sure to notice the difference between the RUN, CMD and ENTRYPOINT, each of which looks like it could help. RUN is not going to work, because this runs a command which is built into the image: RUN happens at the “compile” phase, when the image is generated, not at the “run” phase. The “run” phase is the only time our app gets the environment variables for the postgres server. CMD and ENTRYPOINT both work once the container is instantiated, but running CMD seems to override the existing ENTRYPOINT for the rails base image (which means it won’t run “rails s” to start the server). So, we need both a CMD and an ENTRYPOINT command:

$ printf “CMD bundle exec rake db:schema:load” >> Dockerfile
$ printf “ENTRYPOINT rails s” >> Dockerfile

Now, rebuild (and take another nap) and then run the container.

$ docker build -t sample_rails_app_for_heroku .
$ docker run  -p 3000:3000 \
--link sample_rails_postgres:sample_rails_postgres \
sample_rails_app_for_heroku

Now, if we hit http://192.168.59.103:3000/, we see the rails app running! Eureka, it works. And, we still can use it when working locally.

My biggest issue when working with Docker and Rails

Rails revolutionized web development by making it easy to make a small change in your application code, hit reload, and see your results immediately. Rails was in many ways a response to monolithic and cumbersome Java build steps, compiling and packaging Java source files into a WAR package and then deploying into an application server, a process which could take minutes and killed developer productivity. Docker provides some amazing benefits for minimizing the cost and complexity of deployment, but doing things the way I did means that tweaking a running docker container is a slow process: you have to stop the container, rebuild the image and then run the image again each time you make changes with the way the rails base image works right now. This means you get to take lots of catnaps, but your boss might not like that. Doing it this way makes perfect sense: you want to have the entirety of the source files inside the image so you can ship the complete image off for deployment. But, it would be nice to eliminate RVM and POW from my laptop and only work with Docker. When onboarding engineers, eliminating tools like RVM and POW (which muck with system settings and compile software) would mean new developers could just install Docker and need to run one or two commands rather than troubleshoot many different tools which can have subtle differences on different host OSes and versions. Is there a way to get the benefits of simple deployment with the flexibility of rapid iteration without the cost of a heavy duty rebuild step? I think Docker volumes are the key here. The rails base image as-is builds the source (all code sitting in the same directory as the Dockerfile) into the image. If there were a way to run the container with an environment variable (let’s call it “development”) that would mount the current working directory as a volume rather than building it into the directory that would be a workable option. During production this environment variable would not exist and the running container would use the sources built into the image. This mirrors the way people use environments within rails (“development”, “production” and “test”) and would require no cognitive jumps for rails developers, just a bit of extra syntax on the command line. Imagine these two commands:

$ # These don't work yet, of course! 
$ docker run -e RAILS_ENV=development -p 3000:3000 \
--link sample_rails_postgres:sample_rails_postgres \
sample_rails_app_for_heroku 
$ docker run -p 3000:3000 \
--link sample_rails_postgres:sample_rails_postgres \
sample_rails_app_for_heroku

In short, the first command specifies we are running in development mode and as such, internally the container decides to use a VOLUME mount point to the current directory (so it probably needs to adjust the WORKDIR to use the VOLUME mount point rather than the source code installed into /usr/src/app when executing inside the running container). There is some complexity here which I was not able to work out in time for the publishing of this article but this looks like a workable way to get the best of all worlds when using Docker for both development and deployment with Rails applications. Or, do you know a better way to get at what I want, skip the build step when running in development mode, and still get the benefits of a Docker image when deploying with all source code installed?

TL;DR Instructions

  1. Clone your rails app
  2. Adjust to Ruby version 2.1.2. If you don’t use this, find another base image with the proper version of Ruby. Make sure that everything works by running your tests.
  3. Run “bundle” to install all required gems.
  4. If you want to run things locally using Pow, add the .powrc and .ruby-version files to make sure Pow runs with the correct version of Ruby. Use “powder link” to link the application into pow.
  5. Check to see that the Rails app works locally (“powder open”)
  6. Create a Dockerfile with: “FROM rails\nCMD bundle exec rake db:schema:load\nENTRYPOINT rails s” as the contents.
  7. Start a postgres server in another terminal with a command like: “docker run –name sample_rails_postgres -p 5432:5432 postgres”
  8. Adjust the database.yml file to pull from its environment and properly use the correct host and database when running inside Docker. Use the example in the GitHub repository.
  9. Build the Rails app using a command like: “docker build -t sample_rails_app_for_heroku .”
  10. Run your Rails app using a command like: “docker run  -p 3000:3000 –link sample_rails_postgres:sample_rails_postgres sample_rails_app_for_heroku “
  11. Hit the docker daemon IP on port 3000 (http://192.168.59.103:3000/) and see your application running inside Docker.
  12. If you see any issues, troubleshoot live running containers using nsenter (without having to install an SSH server). If you cannot get the “nsenter” application to install locally using quickstart steps detailed in the README, try the “docker-enter” command referenced further down in the README.
  13. Remember that anytime you make a change to your Rails application, you will need to rebuild the image and restart the container for the Docker container to see those changes. For this reason, it is probably best to develop using your app running locally using Pow. Once you have solidified your changes, then build a new image and ship it to deployment or an image repository.

We’ve forked the Heroku sample repository which you can play with and see the full set of files.

Feature image via Flickr Creative Commons.


A digest of the week’s most important stories & analyses.

View / Add Comments

Please stay on topic and be respectful of others. Review our Terms of Use.