Using Docker and AWS to Survive an Outage

Last week at $WORK, we suffered from an outage that slowed down a large part of our network and took down our main website for both internal and external customers.  We were under a distributed denial of service attack focused on the website itself.  The site is load-balanced, and this resulted in slowdowns or outages for all the services behind the load balancers, as well.

While folks were bouncing ideas around on how to bring the site up again while still struggling with the outage, I mentioned that I could pretty quickly migrate the site over to Amazon Web Services and run it in Docker containers there. The higher-ups gave me the go-ahead and a credit card (very important, heh) and told me to get it setup.  The idea was to have it there so we could fail over to the cloud if we were unable to resolve the outage in a reasonable time.

TL;DR – I did, it was easy, and we failed over all external traffic to the cloud. Details below.

Amazon Web Services

DockerDespite having a credit card and a pretty high blanket “OK”, I wanted to make sure we didn’t spend any money unless it was absolutely necessary. To that end, I created three of the “free tier” EC2 instances (1GB RAM, 1 CPU, 10GB Storage) rather than one or more larger instances. After all, these servers were going to be doing one thing and one thing only – running Docker. I took all the defaults, except two. First, I opted to use RHEL7 as the OS. We use Red Hat at work, so I’m familiar with it (and let’s be honest, it works really well), especially where setting up Docker comes in. Second, I set up a security group that allowed only HTTP/HTTPS traffic to the EC2 instances, and SSH access only from $WORK. Security groups are like a logical firewall, I guess – run by Amazon in front of the servers themselves.

The EC2 instances started almost immediately, and I logged in via SSH using the key pair I created for this project. The first thing I did was augment the security group by setting the IPTables firewall on the hosts themselves to match: SSH from $WORK only, drop everything else, even pings.  You know, just in case.

Note: Since I was planning to use Docker to run the website, I didn’t need to add IPTables rules for HTTP/HTTPS. Docker uses the FORWARD chain, since it NATs from the host IP to the containers, and Docker has the ability to add and remove rules from the chain itself as needed.

Next, I ran a quick *yum update* to get the latest patches on the EC2 instance. It wasn’t terribly out of date, so this was quick.

Now to the meat of things. I didn’t really want to muck about with repos or try to find which one was required to install the Docker RPM, so I just copied the RPM for Docker from our local repository. The RPM is packaged upstream by Red Hat, and includes Docker 1.2.1. Even though I wanted to use Docker 1.4.1, the older RPM version is no big deal – I just installed it to get the basic config files – systemd service files, sysconfig, etc. Once the RPM was installed, I downloaded the Docker 1.4.1 binary from Docker.io, and replaced the 1.2.1 binary from the RPM. Presto! Latest Docker with the handy *docker exec* command! At this point, the server itself was basically done, and I moved on to setting up the Docker image.

Time spent so far: About 5 minutes

Docker

Now, I didn’t have an image for our website ready to go or anything – I was going to have to build it from scratch.  However, I’ve been lucky enough to be allowed to play around with Docker at $WORK, and had already done some generic Images for web stacks for our public DockerDemos project (https://dockerdemos.github.io/appstack/), so I was familiar with what I’d need to build the image for our site. I wrote a Dockerfile and built the image on my local laptop to test it. I went through a few revisions to get it perfect, but it only took about 15 minutes to write it from scratch. Once that was ready, I copied the Dockerfile and supporting files up to the EC2 servers, and built the images there. With the magic that is Docker and Linux containers, everything functioned exactly as it did on my laptop, and in a few seconds all three EC2 instances had the website image ready to go.

The final step was to run the container from the image. On all three of the EC2 instances, I ran:

docker run --name website -p 80:80 -p 443:443 -d website && \
docker logs -f website

The first command immediately started up the web servers inside the containers and started to sync their content, and the second opened up STDOUT inside the container so I could watch the progress. In a minute or two the sync was done, and the servers were online!

Note: The “sync” I’m talking about is part of how our website works, not something related to Docker itself.

Total time spent: About 25 minutes

So, in one fell swoop – about a half hour – I was able to create three servers running a Docker image to serve our main website, from scratch. It’s a good thing, too. It wasn’t long before we made the call to fail over, and currently all of our external traffic to the site is being served by these three containers.

That seems cool, no?  But check this out:

Sunday night, I needed to add more servers to the rotation. It was late. I was cranky to have been called after hours. I logged into AWS and used the EC2 “Create Image” feature to commit one of the running instances to a custom image (took about a minute). Then, I spun up three more EC2 instances from that image. They started up as quickly as a normal EC2 instance, and contained all the work I’d already done to set up the first servers, including the Docker package, binary, and image. Once they were up, all I had to do was run the *docker run* command again, and they were ready to go. Elapsed time?

2 minutes

It took longer for the 5 minute time-to-live on our DNS entry to expire.

Docker is Awesome. With AWS, it’s Awesome-er. I’m trying to convince folks that we should leave all of our external traffic to be served by Docker in AWS, and to migrate more sites out there. At the very least, it’s extremely flexible and allows us to respond to issues on a whole different timescale than we could before.

Oh, and an added bonus? All of our external monitoring (in multiple sites across the country) report that our page load speeds have improved 3x compared to what they were on the servers hosted in-house with regular non-Docker setups. I’m investigating what is giving us that increase this week.

Oh, and a second added bonus? For the last five days, our bill from Amazon for hosting our main website is a whopping *$4.69*. That’s a cup of crappy venti mochachino soy caramel crumble arabian dark coffee (or whatever) at the local coffee chain. And I can do without the calories.

Update:

Well, it’s been six months since this little adventure took place.  Since then, this solution has worked so well that we left all of our external traffic pointing to these instances at AWS.  Arguably, things have gotten even easier the more I work with both AWS and Amazon.  To that point:

  1. The Docker approach worked so well that we replaced all of the servers hosting our website internally with basic RHEL7 servers running Docker containers.  The servers are considerably more lightweight than they used to be, and as such we can get better performance out of them.
  2. I’ve since added the new(-ish) Docker flag –restart=always to the deploy command.  This saves me the step of even having to start the containers on reboot.
  3. I setup all the hosts to use the Docker API, and TLS authentication, so I can upload new images and start and stop containers on each host without even needing to login to them.  (This required the opening of port 2376 to $WORK in the security group and host firewall, fyi.)
  4. I wrote a couple of simple bash scripts to re-build the image as needed, and deploy locally for testing.  With the portable nature of Docker images, it’s extremely easy for me to test all changes before I push them out.
  5. Rotating the instances in and out of production at AWS is extremely simple with Amazon’s Elastic IP Addresses.  We are able to rotate a host out of service and instantly replace it with another, allowing us to patch them all with zero downtime.
  6. Amazon’s API is a wonderful thing.  I can manage the entire thing with some python scripts on my laptop, or the convenient Amazon CLI package.

Docker and AWS have proven themselves to me, and though this process, the higher-ups at $WORK.  We’re embracing Docker whole-heartedly in our datacenters here, and we’ve moved a number of services to AWS now, as well.  The ease and flexibility of both is a boon to us, and to our clients, and it’s starting to transform the way we do things in IT – the way we do everything in IT.

Docker and Security

At $WORK, we have discussed at length security issues as they relate to Docker and the Linux lxc technologies. Our general takeaway is that you cannot ever really trust an image that you didn’t build yourself. As Daniel Walsh of Red Hat explains in his article Docker Security with SELinux, it appears that those concerns are valid.

Security

The problem here revolves around namespaces, and the fact that not everything in Linux is namespaced. Consider /sys, /proc, /dev/sd*, /dev/mem, etc. So what we have here is that root inside the containers effectively has root access to any of these file systems or devices. If you can somehow communicate with them, then consequently, you can own the host with little effort.

Then, What Do?

You should only run Docker images that fit one of the following criteria:

  • You have built the image yourself, from scratch
  • You have received the image from a trusted source
  • You have built the image from a third-party’s Dockerfile, which you have fully read, and understood

Also be careful with your “trusted” source. The base images are probably OK. Images released by, say, Red Hat or Canonical are probably OK. Images in Docker Hub, Docker’s official image registry, might not be. That’s not a hit on the Docker guys – it’s because there are over 15,000 images they’d have to have verified manually to be sure.

Don’t Forget Traditional Security Practices

Finally, you need to be concerned with the security of your Docker containers as well – just as concerned as if the stuff in the container were running on a regular server. For example, if someone were to compromise the webserver you have running in a container, and get escalated root privileges, then you effectively have a compromised host.

Be careful out there. Maintain the security of your containers, and only run images you can fully trust.

How ’bout CoreOS as your Cloud base?

I’ve heard the name CoreOS around a little bit over the last two months or so, but it hadn’t really jumped out at me until last week when Mark McCahill mentioned it in a meeting. He’d read some pretty cool things about it: minimal OS, designed for running Docker containers, easy distributed configuration, default clustering and service discovery. In particular the use of etcd to manage data between clustered servers caught my eye – we’ve been struggling at $WORK with how to securely get particular types of data into Docker containers in a way that will scale out well if we ever need to start bringing up containers on hosts that don’t physically reside in our datacenters.

CoreOSI haven’t even gotten into the meat of CoreOS yet, but just now, I accomplished a task that was surely lifted out of science fiction. With the assistance of Cobbler as a PXE server and a Docker container (what else!) as a quick host for a cloud-config file, I was able to install CoreOS in seconds and SSH into it with an SSH public-key that I provided it in the cloud-config file. I was legitimately shocked by how quick and easy it was.

Docker containers start instantly – it’s one of their best features. It allows us to ship them around with impunity; perform seamless maintenance. CoreOS hosts live in the same timescale, meaning we an PXE boot and configure new hosts, specifically designed for hosting Docker containers, in seconds, and from anywhere. CoreOS offers support for installing onto bare metal, and that would surely give you the best performance, but take a moment to comprehend the flexibility given to you by using virtual machines instead.

Make an API call to your VMWare or Xen/KVM cluster in your local or remote datacenters to create and start a virtual machine. Or do the same thing with an Amazon host. Or Google. The VM PXE boots into CoreOS, within seconds and joins its cluster, and begins getting data from the rest of the cluster. Within minutes, Docker images are downloaded and built, and containers are spinning up into production. It doesn’t get any more flexible than that. At this point scaling and disaster recovery are hindered only by your ability to produce applications that can handle it. It doesn’t matter if a particular Docker container; something happens to it, you just bring up another somewhere else. Along those lines, it doesn’t matter if an entire host is up or down. That’s what it means to be in the same timescale. Containers and their hosts can be brought up and down with impunity, with no impact to the service.

Another benefit to abstracting the CoreOS away from the bare metal is the freeing of ties to a particular technology. If you can design your systems to use the APIs of your local VM solution and the remote APIs from various cloud vendors, then you can move your services wherever you need them. As long as you can control the routing of traffic in some way (load balances, DNS, Hipache, some of the cool new things being done by Cisco), and the DHCP PXE options for your host servers, then your services are effectively ephemeral and not tied to a particular location or vendor.

For now this is all still very beta, both Docker and CoreOS, but the promise being shown is real. Everyone, from the largest internet giants to the smallest one-room startups, will benefit from the coming revolution in computing.

Docker "Best Practices" (that don’t exist yet)

I’ve noticed the ways in which I set up new Docker images have shifted the more I work with the technology. For example, when I first started with Docker, I put almost all my configurations into the Dockerfile. This is easy – and the way Docker suggests it on their site – and the biggest benefit is how each command ends up being it’s own layer, and they can be cached for quick building if you make a mistake. However, it gets kind of tedious trying to manage tons of bash commands or copy a bunch of files using the RUN and ADD commands. Also, each line counts against the layer limit (though hopefully that will be fixed in a newer version of Docker). The kicker though is that complex bash commands are just hard to pull off inside the Docker file, and lack the real flexibility that running a script offers. Summary: complex bash in a Dockerfile is complex.

So next I moved to having my Dockerfile copy in a large bash script, and run just run that script. This method has the advantage of easy configuration (Hey – 2 whole lines!) and a lot of flexibility. Unfortunately, there’s really only a single checkpoint layer to use with the Docker cache – any single change to the script and the entire image has to be rebuilt from scratch. This makes development time considerably longer, and quick development is one of the big selling points with Docker. Don’t get me wrong – spinning up a container is much quicker than configuring a whole server, but with this method of building, I end up spending a lot of time staring at the screen as my images build. The big, single script also contained a lot of code that ended up being very similar between different images. I’d copy whole swaths of useful base configs out into another giant script to run in another container.

So that brought me to the current way of doing things: copying in the entire image directory (the one that contains the Dockerfile) to /build on the container, and then having the Dockerfile run a few scripts to build the image (3, in fact) Each of THOSE scripts, in turn, run other scripts specifically designed to setup one aspect of the image (one for email, if needed, or syslog, etc). This has the advantage of letting me drop in only those scripts I need for that particular image, and makes the scripts themselves a little friendlier to look at. Unfortunately, it doesn’t solve the long build times, though. The gotcha is that I’m copying everything into /build, so I don’t have to enumerate every file I’m using withing the Dockerfile. But that means that each script I change rewinds Docker back to the ADD where it’s copied in, and every script has to run from scratch without cache.

This is now slowly leading me to have a separate script for every piece except that which distinguishes the container’s main function. This makes for very portable, easy to organize configurations, but requires that each image has dozens of scripts with it. Managing all these scripts is usually accomplished (in other technology worlds) with a central version control repository like git. That way you can clone the scripts you need, and they’re all maintained in one location for easy updating. This method has an inherent drawback too, though. Each Dockerfile and it’s resulting image are dependent on an entirely separate repo for them to work, or even build.

I thought about making a base image, similar to what the Phusion guys are doing with phusion/baseimage-docker. This would allow me to put all the bits that I copy into each image every time (the email, syslog, etc) into a single image, and then add on only the relevant bits for the each container’s main function. This makes managing each image easier, but also makes them less portable, since you have to have both the Dockerfile for the image you want, and the base image for it to pull from when it builds.

It’s not yet clear what the best way to accomplish this will be. I imagine there will be best practices for any given situation.

Fully generic, basic demo image shared with the world? All in the Dockerfile.
Complicated, multi-service images shared with the world? Many little scripts, or a single large one.
Many images with custom configs more easily managed for $WORK? Base Image.

I am looking forward to seeing how other people use Docker, and what evolves as the technology, and the community using it, matures.

Email and Docker-based Drupal Containers

Mailbox

Got email support working in my DockerDemos Drupal Docker image. SSMTP is the way to go with these containers. There’s no running daemon to have to manage, or to take up resources.

The image is setup to either do nothing with mail, use a default SSL setup if you pass your own SMTP server as an environmental variable (perfect for $WORK!) or let you use your own custom ssmpt.conf file.

Yeah, EMAIL! Lost password messages galore!


I’m beginning to copy over my technology-related posts from Google+ to this blog, mostly so I have an easy-to-read record of them. This one was originally published on 19 May 2014: Email and Docker Drupal Containers

"Cloud-style" Docker Demo Container

Completed a first pass at a minimal “Cloud-style”#Docker container. It’s sort of like an EC2 instance. You generate an ssh pem file, and pass the public key in as an environmental variable at docker run:

sudo docker run -i -t -d -P \
-e PUBKEY="$(cat ~/.ssh/my.pem.pub)" cloudbase

You end up with a CentOS container, and a user “clouduser” that has sudo w/no password rights.

I think this would be a good way to get some folks interested in Docker – perhaps offering something like this as a playground/sandbox to build interest.

Visit a website, get a Docker CentOS container!

Code: https://github.com/DockerDemos/CloudBase


I’m beginning to copy over my technology-related posts from Google+ to this blog, mostly so I have an easy-to-read record of them. This one was originally published on 19 May 2014: Cloud-style Docker Container

Quick Bash Script to Update Docker

Quick Docker Tip

To run the latest version of Docker each time you start it, it’s as easy as creating and running this script:

#!/bin/bash
OPTS="-d"

if [[ -f ~/dockerbin/docker ]] ; then
rm docker
fi
if [[ ! -d ~/dockerbin/ ]] ; then
 mkdir ~/dockerbin
fi

wget https://get.docker.io/builds/Linux/x86_64/docker-latest \
-O ~/dockerbin/docker

chmod +x ~/dockerbin/docker
sudo ~/dockerbin/docker $OPTS &

If you need to add special options (like -g to change the location of the Docker install directory), you can edit the variable at the top of the file.

I don’t recommend using this for production systems, but while Docker is under heavy development, this is an easy way to stay up-to-date and get the bugfixes.


I’m beginning to copy over my technology-related posts from Google+ to this blog, mostly so I have an easy-to-read record of them. This one was originally published on 13 May 2014: Quick Docker Tip