Three Docker Build Strategies

Three Docker Build Strategies

There are any number of ways to use containers and numerous ways to build container images.  The creativity of the community never ceases to amaze me – I am always stumbling across a creative new use case or way of doing things.

As our organization at $WORK has adopted containers into our production workflow, I have tried many different permutations of image creation, but most recently I have distilled our process down to three main strategies, all of which coalesced around our use of continuous integration software.

Build In Place

Building in Place is what most people think of when talking about building container images.  In the case of Docker, the docker build command takes a Dockerfile and probably some supporting files are uses them to produce an image.  This is the basic way to produce an image and the other two workflows below make use of this at some point, even if inherited from a parent.

The main benefit of this process is that it is Simple. This is the process as documented on the Docker website.  Have some files.  Run docker build.  Voila!

It is also Transparent.  Everything that happens in the build is documented by the Dockerfile.  There are no surprises.  There are no outside actions acting on the build process that can change the result.*  You the human can see every step of the process laid out in the Dockerfile.

Finally, it is Self-Contained.  Everything needed for the build to succeed is present locally in the directory on your computer.  Give these files to someone else – in a tarball, or a git repo – and they too can build an identical image.

We use the Build in Place method to create our base images.  These builds contain all the sysadmin-y tasks that used to go into setting up a server prior to handing off to a developer to deploy their code: software updates, webserver installation and generic setup, etc.  The images are all generic and with very few exceptions, no real service we use is created from a Build In Place process.

* Unless you have a RUN command that curls a *.sh file from the web somewhere and pipes it to bash.  But in that case you are really just asking for trouble anyway.  And shame on you.

Inject Code

The Inject Code method of building a container image is the most used in our organization.  In this method, a pre-built parent image is created as the result of a Build In Place process.  This image has several ONBUILD instruction in the Dockerfile, so when a child image is created, those steps are executed first.  This allows our CI system to create an empty Dockerfile with the parent image in the FROM instruction, clone a git repo with the developers code, and run docker build.  The ONBUILDinstructions inject the code into the image and run the setup, and we end up with an application-specific container image.

For example, our Ruby on Rails parent image includes instructions such as:

ONBUILD ADD . $APPDIR
ONBUILD RUN bash -x /pick-ruby-version.sh
ONBUILD WORKDIR $APPDIR
ONBUILD RUN gem install bundler \
            && rbenv rehash \
            && bundle install --binstubs /bundle/bin \
                              --path /bundle \
                              --without development test \
                              --deployment \
            && RAILS_ENV=production RAILS_GROUPS=assets \
               bundle exec rake assets:precompile

The major benefit of this build worklow is that it Removes System Administration Tasks from Developers.  The sysadmins build and maintain the parent image, and developers can just worry about their code.

The workflow is also relatively Simple for both the sysadmins and the developers.  Sysadmins effectively use the Build In Place method, and developers don’t actually have to do any builds at all, just commit their code to a repo, triggering the CI build process.

The CI process is effectively just the following two lines (plus tests):

echo "FROM $PARENT_IMAGE" > Dockerfile
docker build -t $CHILD_IMAGE .

The simplicity and hands-off approach of this process is effectively Made for Automation.  With a bit of automation around deploying a container from the resulting image, a developer can create a new app, push it to a git repo and tell the orchestration tool about it, and a new service is created and deployed without any other human involvement.

Unlike the Build In Place process (for which I couldn’t come up with a single real negative), Inject Code has a few gotchas.

The process can be somewhat Opaque.  Developers don’t get a clear view of what exactly is in the parent image or what the build process is going to do with their code when the ONBUILD instructions run, requiring either meticulous documentation by the sysadmins (ha!) (Edit: I was rightly called out for this statement – see below*). tracking down and examining the Dockerfiles for all the upstream images, or inspecting them with the docker history and docker inspect commands.

The build process itself ends up being opaque in practice.  By making it simple and one-step, the tendency is for developers to never look at it, and when the build fails they turn to the sysadmins to figure out what went wrong.  This is really a cultural byproduct of the process, so it might not be an issue everywhere, but it’s what has happened for us.

The Inject Code process also makes it a bit tougher to customize an image for an application.  We have to ship the parent image with multiple copies of ruby, and allow developers to specify which is used with an environment file in the root of their code.  Extra OS packages are handled the same way (think: non-standard libraries).  These end up being handled during the ONBUILD steps, but it’s not ideal.  At some point, if an application needs too much specialization, it’s just easier to go back to the Build In Place method.

* A friend of mine read this after I posted and called me out on the statement here.  I was being a poor team member by not either working with the sysadmins to help solve the problem, explain the necessity or at the very least understand where their frustrations are.  I appreciate the comment, and am glad that my attention was called to it.  It’s too easy to be frustrated and nurture a grudge when in fact the right thing to do is to work together to come to a solution that satisfies both parties.  The former just serves as “wall building” and reinforces silos and poor work culture.

Asset Generation Pipeline

Our final method of generating container images is the Asset Generation Pipeline.  This is a complicated build process that utilizes builder containers to process code or other input in order to generate the assets that go in to building a final image.  This can be as simple as building an RPM package from source and dropping it onto the filesystem to be included in the docker build, or as complicated a multi-container process that compiles code, ingests and manipulates data, and prepares it for the final image (mostly used by researchers).

Some of our developers are using this method to manage Drupal sites, checking out their code from a git repo, and running a builder container on it to compile sass and run composer tasks to prepare the site for actual production, and then including the actual public-facing code in a child image.

The biggest benefit of this process (to me at least) is Minimal Image Size. We can use this process to create final images without having to include any of the build tools that went into creating it.

For example, I use this process to create RPM packages that can then be added to the child image and installed without a) having to do the RPM build or compile from source in the child image build process, or b) include any of the dev tools, libraries, compilers, etc that are needed to create the RPMs.  Our Drupal developers, as mentioned above, can include only the codebase for the production site itself, and none of the meta information or tools needed to produce it.

This process also Reduces Container Startup Time by negating the need to do initialization or asset compilations etc at run-time.  By pre-compiling and adding the results to the child file, the containers can move on to getting started up immediately on docker run.  Given the time required for some of these processes, this is a big plus for us.  Fast startup is good for transparency to end-users, quick auto-scaling for load and reduced service degradation time.

Finally a big benefit of this process is that it can Create Complex Images from Basic Parent Images.   Stringing multiple builder containers along the pipeline allows each container to be created from a simple, single-task parent image.  Each image is minimal and each container has a single, simple job to do, but the end result can be a very complex final image.

Drawback of the Asset Generation Pipeline process are fairly obvious.  First off, it’s fairly Complicated.  The CI jobs that produce the final images are long, and usually time-consuming.  They require a lot of images, and create a lot of containers.  We have to be careful to do efficient garbage collection – nothing is worse than being paged in the middle of the night because a build host ran out of disk space.

They are also More Prone to Failure.  As any good engineer knows, more parts means more points of failure.  The longer the chain, the more things that can go wrong and spoil a build.  This also necessitates better (and more) tests.  Having a half dozen containers prepare your code base mean it could be wrong in a half dozen different ways if your tests aren’t good.

Finally, from a technical perspective, using a pipeline that generates output makes it Difficult to Build Remotely.  Our CI system relies on Jenkins or Gitlab CI host which connects to remove Red Hat Atomic servers to run the docker build command.  This works by cloning repositories locally to the CI host, and sending the build context to the Atomic host.  Unfortunately, generated assets are left on the Atomic host, not in the build context that lives on the CI server.  This necessitates some work arounds to get the assets back into the build context, or in some cases, different build processes that skip the centralized CI servers in favor of custom local builds.

So those are the three primary ways we are building images in production at $WORK.  There are tons of different and creative ways to create images, but these have proven to work for the use cases we have.  That’s not to say there aren’t other legitimate cases, but it’s what we need at the moment, and it works well.  I’d be interested to hear how others are doing their builds.  Do they fit in one of these patterns?  Is it something more unique and cool?  There’s always so much to learn!

 

Statement of GPG Key Transition

—–BEGIN PGP SIGNED MESSAGE—–
Hash: SHA1,SHA512

Fri Dec 9 11:49:22 EST 2016

Statement of GPG Key Transition
——————————-

In order to replace my older DSA-1024 key, I have set up a new OpenPGP key, and will be transitioning away from my old key.

The old key will continue to be valid until 2017-06-01, but future correspondince should come to the new key. I would like the new key to be integrated into the web of trust.

This message is signed by both keys to certify the transition.

The old key was:

pub dsa1024/B5EE841627F7BF37 2008-08-26 Christopher Collins
Primary key fingerprint: 69E6 0653 A1A3 0600 ADB2 B3AD B5EE 8416 27F7 BF37

And the new key is:

pub rsa2048/F5752BA146234FD4 2016-12-09 Christopher L. Collins
Primary key fingerprint: 923E 0218 77DB 3F70 F614 6F62 F575 2BA1 4623 4FD4

To fetch my key from a public key server, you can do:

gpg –keyserver pgp.mit.edu –recv-key F5752BA146234FD4

If you have my old key, you can verify the new key is signed by the old one:

gpg –check-sigs F5752BA146234FD4

To double-check the fingerpring against the one above:

gpg –fingerprint F5752BA146234FD4

Finally, once you are satisfied this key represents me and the UIDs match what you expect, please sign my key, if you don’t mind:

gpg –sign-key F5752BA146234FD4

Thank you, and sorry for any inconvenience.

-Chris

—–BEGIN PGP SIGNATURE—–

iEYEARECAAYFAlhK+2UACgkQte6EFif3vzcybACg+FO1UuIK3hKA/IUIoR1CsqiM
MvsAoJ4zmeh7JjKyhlfyFDFD95G5U1pDiQEcBAEBCgAGBQJYSvtpAAoJEPV1K6FG
I0/UjM4IAKqifcolct4klHutTD3fcBy3sMoseR7cvA9mpG/TvSUUhBGEK1R+ssKI
/lGjnR2vnJVUltnS6lAUHy0GafloPEdkQhlRFimtBW+3pBKGbqVHzDwYevEqt5Qv
dOvr4UgbOvjIdt2FTl24ht8Sf14LU+znlTF77PTP9CW6hbIcAZatLrSKcWbse4cu
kQRhQQystBHLohGkCYW52IrOz1Vyy5K0NtbQm1sAkbqZqOuAV98z0EkpnMeiP0Vf
A5bJjA+Nu4XIN+OLSxYsg32KpyfFPqPfQbf3zv5i9gr6hl/gdEl2QYRK+A89kAzf
qmT97XAQmNTczFuP/OLbjc0dMALl+zM=
=YUAx
—–END PGP SIGNATURE—–

When Systemd, Docker and Golang Butt Heads

Goats_butting_heads_in_Germany

Image by Marius Kallhardt from near Bremen, Germany
Creative Commons Attribution-Share Alike 2.0 Generic License

We ran into a fun little bug this week at work that took a good while to track down to it’s source.  Imagine this scenario:

We start to receive reports that Docker has restarted, causing containers running on the hosts to restart, sporadically across our development and testing environments.  After some investigation, we tie this to when puppet has run on these servers.  It’s not immediately apparent why, and successive puppet runs don’t cause the same behavior.

Eventually we realize that it was related to a change we made for our systemd-journald configuration, that was being pushed out during the problematic puppet runs.

Our RHEL7 servers have not been configured to maintain a persistent journal. That is, by default, the journal is written to /run/systemd/journald, and is refreshed (lost) with every reboot.  We decided to configure the journal to maintain logs for several boots, and did so by setting it up in puppet and pushing out the change, complete with a notify to systemd-journald, to restart the service.  This was pushed to dev, and shortly after, test.

However, despite the fact that we knew it was related to the journald change, we could not reliably cause it to happen.  Converting a box to persistent journals and restarting journald wouldn’t immediately cause Docker to fall over – it would take a few minutes before the service died.

Then it got even weirder.  We realized that no changes actually had to happen – we only needed to restart systemd-journald to cause the issues with Docker.  And interestingly, we could get Docker to crash by sending any three Docker commands.  One `docker ps`?  Everything is fine.  Two?  No problem.  Three? KA-BOOM!

After this behavior was finally identified (and it took a while – it’s to troubleshoot something when it only fails the *third time* you try it), some Googling lead us to a bug report already filed with Docker where Dan Walsh (@rhatdan) explained:

…when you run a unit file systemd hooks up stdout/stderr to the journal, if the journal goes away.<sic> These sockets will get closed, and when you write to them you will get a sigpipe…

…Golang [less than v 1.6] has some wacky code that ignores SIGPIPE 10 times and then dies, no matter what you do in your code.

There’s the three times-ish.  STDOUT and STDERR are written to by Docker when you issue a Docker command, and three commands cause Docker to crash.  And yes, I know my math adds up to nine, not ten.  From what I can tell our automation was also calling to the Docker API during the time we were testing, which was why we were seeing three as the limit.

The good news is there appear to be a plethora of patches making their way into the world.  A fix/workaround was added to Systemd by Lennart Pottering back in January,  Golang 1.6 will not suffer from the issue, and Red Hat has apparently patched Docker 1.9 and will be pushing that out, hopefully, early in April.

 

Quick Tip – Docker ENV variables

It took me a little while to notice what was happening here, so I’m writing it down in case someone else needs it.

docker_env

Consider this example Dockerfile:

FROM centos:centos7
MAINTAINER Chris Collins

ENV VAR1="foo"
ENV VAR2="bar"

It’s common practice to collapse the ENV lines into a single line, to save a layer:

FROM centos:centos7
MAINTAINER Chris Collins

ENV VAR1="foo" \
    VAR2="bar"

And after building an image from either of these Dockerfiles, the variables are available inside the container:

[[email protected] envtest]$ docker run -it envtest bash
[[email protected] /]# echo $VAR1
foo
[[email protected] /]# echo $VAR2
bar

I’ve also tried to use ENV vars to create other variables, like you can do with bash:

FROM centos:centos7
MAINTAINER Chris Collins

ENV VAR1="foo" \
 VAR2="Var 1 was set to: ${VAR1}"

This doesn’t work, though.  I assume $VAR1 is not set yet when Docker builds the layer, so it cannot be used in $VAR2.

[[email protected] envtest]$ docker run -it envtest bash
[[email protected] /]# echo $VAR1
foo
[[email protected] /]# echo $VAR2
Var 1 was set to:

Using a single line for each ENV does work, though, as the previous layer has been parsed and added to the environment.

FROM centos:centos7
MAINTAINER Chris Collins
ENV VAR1="foo" 
ENV VAR2="Var 1 was set to: ${VAR1}"

[[email protected] envtest]$ docker run -it envtest bash
[[email protected] /]# echo $VAR1
foo
[[email protected] /]# echo $VAR2
Var 1 was set to: foo

So, while it makes sense to try to collapse ENV lines, to save layers**, there are definitely cases where you’d want to separate them.  I am using this in a Ruby-on-Rails image:

[...]
ENV RUBYPKGS='ruby2.1 mod_passenger rubygem-passenger ruby-devel mysql-devel libxml2-devel libxslt-devel gcc gcc-c++' \
    PATH="/opt/ruby-2.1/bin:$PATH" \
    NOKOGIRI_USE_SYSTEM_LIBRARIES='1' \
    HTTPDMPM='prefork'

ENV APPENV='test' \
    APPDIR='/var/www/current' \
    LOGDIR='/var/log/rails' \

ENV RAILS_ENV="${APPENV}" \
    RACK_ENV="${APPENV}"
[...]

A logical separation of sections is helpful here – the first ENV is for system stuff, the second for generic application setup on the host, and the third to set the application environments themselves.

**I have heard rumblings that in future versions of Docker, the ENV stuff will not be a layer – more like metadata, I think.  If that is the case, the need to collapse the lines will be obsoleted.