PODMAN: ‘overlay’ is not supported over xfs

I *love* Podman.

While Podman purports to be a way to test and troubleshoot Pods – “the smallest deployable units of computing that can be created and managed in Kubernetes” – where its real value lies for me and my coworkers is as a non-root, daemonless, drop-in replacement for Docker. We can run containers on our laptops and our servers without needing root access or the overhead of the Docker daemon.

It works so well, you could probably replace Docker with Podman and people wouldn’t even notice

This morning, I saw a tweet by Dan Walsh of Red Hat, linking to an article he wrote on the details of containers/storage:

I did deep dive into the details of containers/storage, Content you did not know you wanted to know. ūüĎćYou probably want to bookmark this blog for future reference on where your container images are stored. @kubernetesio @redhat @openshift @coreos https://t.co/4yLNe8LNQW‚ÄĒ Daniel Walsh (@rhatdan) January 24, 2019

https://twitter.com/rhatdan/status/1088499288865423360

This was something I’d been looking for! Buildah, Podman, Skopeo – these are all great tools for working with containers sans Docker, but it was unclear to me how they all worked together with regards to the container images they each had access to. The article cleared all that up, and it re-primed my interest in playing around with Podman again.

(I’ve been so focused on OKD (formerly OpenShift Origin) at $WORK that I’d not built or run a container by hand in a while.)

Apparently, though, Podman had different ideas:

$ podman ps ERRO[0000] 'overlay' is not supported over xfs at "/home/chris/.local/share/containers/storage/overlay"  error creating libpod runtime: kernel does not support overlay fs: 'overlay' is not supported over xfs at "/home/chris/.local/share/containers/storage/overlay": backing file system is unsupported for this graph driver

I called shenanigans on that… I’ve been using overlay2 with Podman and Docker for – years? – now, with the latest version of Fedora. The kernel dang-well *does* support it!

Weirdly, I could pull an image once, if I wiped out /home/chris/.local/share/containers/storage, and would get the error, but it would work. Every subsequent command would fail though, even just podman ps:

$ podman pull centos:latest
ERRO[0000] 'overlay' is not supported over xfs at "/home/chris/.local/share/containers/storage/overlay"
Getting image source signatures Copying blob sha256:a02a4930cb5d36f3290eb84f4bfa30668ef2e9fe3a1fb73ec015fc58b9958b17 71.68 MiB / 71.68 MiB [====================================================] 6s Copying config sha256:1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb 2.13 KiB / 2.13 KiB [======================================================] 0s Writing manifest to image destination Storing signatures 1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb

$ podman images ERRO[0000] 'overlay' is not supported over xfs at "/home/chris/.local/share/containers/storage/overlay" Could not get runtime: kernel does not support overlay fs: 'overlay' is not supported over xfs at "/home/chris/.local/share/containers/storage/overlay": backing file system is unsupported for this graph driver

Knowing that I’d had Podman working before, I double-checked all the things I could think of that might have been an issue.

I recently partitioned some free space and mounted it to containers/storage so it wouldn’t fill up the rest of my home directory. Since I *just* setup the filesystem (xfs) I checked that -ftype=1 was set. Older versions of CentOS and RHEL did not default to that, and that setting is required for Overlay to work. Perhaps I forgot to do that?

No, it’s definitely set:

[[email protected] ~]$ xfs_info /home/chris/.local/share/containers/storage/ | grep ftype
naming =version 2 bsize=4096 ascii-ci=0, ftype=1

Then I checked the SELinux permissions.

No, not because “it’s always SELinux”. Come on, now…

I checked the context because I’d recently mounted the partition to contianers/storage, I wanted to be sure the context was correct. This was an issue I’d run into at $WORK, when we mounted large partitions to /var/lib/docker, and the Docker daemon failed to work due to incorrect context.

In this case, they appeared correct, but I checked just to be sure:

[[email protected] ~]$ ls -ldZ ~/.local/share/containers/storage/
drwxr-xr-x. 9 chris chris system_u:object_r:data_home_t:s0 150 Jan 25 10:10 /home/chris/.local/share/containers/storage/

[[email protected] ~]$ matchpathcon ~/.local/share/containers/storage/
/home/chris/.local/share/containers/storage unconfined_u:object_r:data_home_t:s0

[[email protected] ~]$ matchpathcon -V ~/.local/share/containers/storage/
/home/chris/.local/share/containers/storage verified.

After pulling out all the hair I don’t, as a bald man, have, I tried dnf reinstall podman … with no luck.

Finally, I decided this was past my ability to fix on my own, and to open an issue in the Podman GitHub repo. I decided to double-check for existing issues, I found this:

Yes, and I think that’s a duplicate of containers/libpod#2158

If your ~/.config/containers/storage.conf is using the camel case format, then try switching to the lower case format, remove ~/.local/share/containers, and retry fedora-toolbox. See the above libpod issue about the formats.

https://github.com/debarshiray/fedora-toolbox/issues/42#issuecomment-457269578

Well, dang. The storage.conf in my homedir was all camel-case-y:

RunRoot = "/run/user/1000"
GraphRoot = "/home/chris/.local/share/containers/storage"
GraphDriverName = "overlay"
GraphDriverOptions = ["overlay.mount_program=/usr/bin/fuse-overlayfs"]

And that’s not at ALL what Dan’s looked like in the article this morning. For one thing, his had sections…

[storage]
# Default Storage Driver driver = "overlay"

...

So, it looks like containers/libpod#2158 was the culprit. I was using an old config file, and because it’s the right thing to do (if unhelpful in this case) dnf update did not replace the config file when I upgraded packages recently.

So, time to get a new one. First, though, since so many things use containers/storage, it’s unlikely that it’s a Podman config file. dnf didn’t seem to know what the $HOME/.config/containers/storage.conf file in my homedirectory belonged to (or more likley, I don’t know how to ask it properly…), but it did tell me that the global /etc/containers/storage.conf belonged to the containers-common package:

[[email protected] ~]$ dnf whatprovides /home/chris/.config/containers/storage.conf
Error: No Matches found

[[email protected] ~]$ dnf whatprovides $HOME/.config/containers/storage.conf Error: No Matches found

[[email protected] ~]$ dnf whatprovides /etc/containers/storage.conf
containers-common-1:0.1.34-1.dev.gite96a9b0.fc29.x86_64 : Configuration files for working
: with image signatures
Repo : @System
Matched from:
Filename : /etc/containers/storage.conf

Since I hadn’t really done any special customizations, I just went ahead and removed the storage.conf file in my home directory, and reinstalled containers-common and was provided with a shiny, new, package-fresh configuration file:

[storage]
driver = "overlay"
runroot = "/run/user/1000"
graphroot = "/home/chris/.local/share/containers/storage"
[storage.options]
mount_program = "/usr/bin/fuse-overlayfs"

And with the correct config file, Podman was much happier, and I could get back to building and running container images without root!

[[email protected] ~]$ podman pull docker.io/centos:latest && podman images
Trying to pull docker.io/centos:latest…Getting image source signatures
Copying blob a02a4930cb5d: 71.68 MiB / 71.68 MiB [==========================] 6s
Copying config 1e1148e4cc2c: 2.13 KiB / 2.13 KiB [==========================] 0s
Writing manifest to image destination
Storing signatures
1e1148e4cc2c148c6890a18e3b2d2dde41a6745ceb4e5fe94a923d811bf82ddb
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/centos latest 1e1148e4cc2c 7 weeks ago 210 MB

#NoBigFatDaemon

How I found Linux

I was gradually exposed to Linux over time. I came into contact with it over and over before it finally stuck with me. My first experience was with Knoppix, a Live-CD based distro, when I was working at the computer lab in college. A co-worker was also just learning about Linux, and shared it with me. We were both totally impressed that we could get a working operating system booted from a CD and use it without changing the Windows installs we had on our laptops. I had little use for it at the time, but it stuck in my memory because “hey, this is neat”.

Because of my experience playing with Knoppix, when I got my first job out of college working at an IT help desk for an all-Windows company, I volunteered when one of the sysadmins asked if any of us would be willing to look into Linux. His request led to me playing around with both Mepis Linux – another Live-CD distro – and FreeBSD (for no particular reason). I installed them both onto a pair of computers and tried to evaluate them. FreeBDS booted me to a command prompt after installation, and I was completely lost. DIR didn’t work (I was such a n00b…). I tried to learn some commands, but eventually decided it wasn’t useful enough to me without a desktop (SUCH a n00b). Mepis was more useful, and I shared some things I learned about it with the sysadmin who’d asked. Now-me laughs at what I thought might have been useful to him at the time.

In 2005, about a year later, I’d left that job and moved to a new city so my wife could attend law school. I eventually found a job at a local computer repair place working with several other young and extremely technology-oriented guys. Part of the job was wiping and re-installing windows on customer computers, and we probably did five or six installs a day, each. Unrelated to the job, a pair of co-workers had discovered Ubuntu (waaaay back in the Breezy Badger days) and were playing around with it for multimedia stuff. We all started to poke at it and share our discoveries. The fist time I installed Ubuntu and it booted right to the desktop without prompting me for a license key (which I had been entering five or six times a day with Windows) I said out loud, “Woah!”, and smiled.

I vividly remember that moment.

Ubuntu stuck with me. It was the most polished Linux I’d used to that point, and the Ubuntu forums were filled with helpful – and friendly – support, compared to the gruff “RTFM” attitudes I had found elsewhere in my early Linux exploration. I installed Ubuntu at home, on all my PCs. It became my daily driver.

When I left that job and started at a service desk for the university I currently work for, I took Ubuntu with me, and installed it at work. I shared Linux (via Ubuntu) with co-workers and friends, and when I found out the infrastructure department at work ran Linux on their servers, I started to volunteer with them to learn more. After a few years, the manager of that department became the manager of the sysadmin group for the university’s central IT department, and he hired me as a junior sysadmin.

That was it. A year into the sysadmin job, I installed Linux on my last windows machine – one I’d been holding on to solely for gaming. I was exposed to CentOS and RHEL professionally. I used Linux at home and at work. That was 12 years ago, and I’ve never used another OS since then.¬† Currently, I use Fedora on my laptop, run RHEL and RHEL Atomic servers and several distro varieties in containers at work, and Raspbian and Fedora on Raspberry Pis at home. Linux is part of my personal and professional life, and has directly shaped my career and what I do today.

No “Flash” in Firefox 60, Fedora 28

No “Flash” in Firefox 60, Fedora 28

This morning, I upgraded my laptop from Fedora 27 to Fedora 28 (woo! sweet new versions of … stuff!).¬† It sure is a pretty smooth process these days. A couple dnf commands and 30 minutes later, and I was rocking the new hotness that is F28.

Pretty shortly after that, I put on my headphones and tried to get down to work.  There was a problem, though.  After opening Google Play Music to listen to some Wheezer, I was greeted by an unwelcome sight:

missing_flash_player

Flash Player?!  Horse Hockey!

What decade is this?¬† No, thank you, Google.¬† I am not installing any of Adobe’s products.

YouTube was no better, warning me:

Your browser does not currently recognize any of the video formats available.
Click here to visit our frequently asked questions about HTML5 video.

Luckily, the YouTube linked me to their HTML5 FAQ, which jogged loose some memories.

Missing codecs – codecs that I knew were there before –¬† confirmed by YouTube HTML5 FAQ:

firefox_codecs

H.264 codecs.

Crap.

That was totally my fault.¬† During the Fedora upgrade, there were a number of packages that didn’t have upgrade candidates.¬† I scanned the list quickly, and determined I didn’t need them anymore, and re-ran the package download with the --allowerasing flag.

I’d disabled the RPMFusion repositories a while back, preferring to default to more trusted upstream repos, and of course FFMPEG and other video packages were installed from there.¬† That’s where the broken upgrade candidates came from.¬† I (mistakenly) decided they were part of the lolcommits animated gif option that I no longer used.

Once I realized what the issue was, it was short work to find the right packages to reinstall, and of course I found the solution on the Fedora Forums:

dnf install https://download1.rpmfusion.org/{free/fedora/rpmfusion-free,nonfree/fedora/rpmfusion-nonfree}-release-$(rpm -E %fedora).noarch.rpm
dnf install gstreamer1-libav gstreamer1-plugins-ugly unrar compat-ffmpeg28 ffmpeg-libs

Bingo, Wheezer while I work.

OpenShift Error – Failed to build: MatchNodeSelector (6)

OpenShift Error – Failed to build: MatchNodeSelector (6)

Brought to you by the Home Office1 department of jolly old $WORK:

I recently upgraded our OpenShift Origin dev cluster from version 3.7 to 3.9, and for the most part things went smoothly.  Version 3.9 brings with it a new version of Kubernetes, a whole host of bugfixes, and other goodies.  A little while after the upgrade, however, one of the engineers doing testing for us noticed that he was unable to deploy any applications, noting:

Nothing is running. They all have events that say:

12:38:58 PM Warning Failed Scheduling 0/6 nodes are available: 6 MatchNodeSelector.

I’m unsure if this was a consequence of the actual upgrade or something else, but I was able to confirm with my own test application.¬† Each build failed with the same error.

After diving in a bit and doing some research, I found that there was a defaultNodeSelector set in projectConfig section of the the master-config.yml:

projectConfig:
     defaultNodeSelector: node-role.kubernetes.io/compute=true

The defaultNodeSelector is does exactly what it says: it sets the default node selector (think: label) for each project. This means all pods without a nodeSelector will be deployed onto an OpenShift node with labels that match the defaultNodeSelector.

Unfortunately for us, none of our nodes had a label that matched node-role.kubernetes.io/compute=true. During our initial install of OpenShift Origin 3.7, we used the suggested example labels in the Configuring Node Host Labels section of the Advanced Install documentation, eg: region=primary for our standard nodes, and region=infra for infrastructure nodes, with the intention that we’d change the region for nodes in other datacenters when we deployed them, or that we’d add extra labels to define special nodes (for compliance, etc).

I was able to verify the labels we did have applied to our nodes with the oc get nodes --show-labels command.

[[email protected] Projects] $ oc get nodes --show-labels
NAME STATUS AGE VERSION LABELS
master-dev-01    Ready    54d    v1.9.1+a0ce1bc657     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datacenter=lab,kubernetes.io/hostname=master-dev-01,node-role.kubernetes.io/master=true,openshift-infra=apiserver,region=primary
node-dev-01      Ready    11d    v1.9.1+a0ce1bc657     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datacenter=lab,kubernetes.io/hostname=node-dev-01,region=infra
node-dev-02      Ready    11d    v1.9.1+a0ce1bc657     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datacenter=lab,kubernetes.io/hostname=node-dev-02,region=infra
node-dev-03      Ready    11d    v1.9.1+a0ce1bc657     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datacenter=lab,kubernetes.io/hostname=node-dev-03,region=primary
node-dev-04      Ready    11d    v1.9.1+a0ce1bc657     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datacenter=lab,kubernetes.io/hostname=node-dev-04,region=primary
node-dev-05      Ready    11d    v1.9.1+a0ce1bc657     beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,datacenter=lab,kubernetes.io/hostname=node-dev-05,region=primary

Some posts I found regarding the defaultNodeSelector suggested setting it to a blank string, but I decided I’d rather go with the region=primary label so we don’t accidentally get pods deployed onto new nodes that we want to spin up in the future. (Disclaimer: I am not 100% sure that’s how the empty string works – I need to do further research.)

After changing the master-config.yml file to use our chosen value, it was just a matter of restarting the Origin master service:

systemctl restart origin-master-controllers origin-master-api

With that done, I was able to kick off a new deploy, and watch as pods were scheduled onto the nodes with the region=primary label.

I am a little uneasy following all of this. I never was able to find out what caused the change in the master-config.yml. That value was unlikely to be set already (though I cannot say for sure), and there’s nowhere in the OpenShift-Ansible playbook referencing that value. One possibility is that the master-config.yml was replaced during the upgrade with a default from the Origin master container image2 , and then updated by the Ansible playbooks.

One of the drawbacks to the otherwise excellent OpenShift-Ansible advanced install process is that it’s not conducive to configuration management for the config files, as it generates new ones. I suppose one should use the actual Ansible playbooks as your configuration management – that’s what would be done with a standard Ansible-managed host – but it feels different somehow. Maybe that’s just me.

Finally, a semi-related point that was brought up by all of this is the need to have some better, more descriptive labels. region=primary works for now, but we’ll be better off in the long run with labels that reflect more about the hosts themselves. Chalk that up to just getting a dev cluster up and running. Now we know what we need for production.


1. See what I did there? …yeah, ok, it’s a stretch.

2. We’re running a fully containerized install of Origin on RHEL Atomic hosts. The install process copies files from inside the container to the host filesystem, and then mounts those files into the container so they can be managed like a traditional host.

ERROR: could not find an available, non-overlapping IPv4 address pool

ERROR: could not find an available, non-overlapping IPv4 address pool

Creating network “” with the default driver
ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

This appears to be related to the number of IP addresses reserved for Docker networks. On my laptop, it looks like 31 is the magic number (a /27 subnet, perhaps?).

docker-compose run --rm test
Creating network "test_default" with the default driver
ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

# Check the network list
# Don't count the header
docker network ls -q | wc -l
31

# Remove an unneeded network
docker network rm 173fff3fa69b
173fff3fa69b

# retry
docker-compose run --rm test
Creating network "test_default" with the default driver
"It Worked!"

I definitely need to dig in more and figure out what is actually happening under the covers, but for now, quick fix.

Buildah: A new way to build container images

Buildah: A new way to build container images

Project Atomic’s new tool, Buildah, facilitates new ways to build container images

A previous post covered a few different strategies for building container images. The first, building container images in place, is what everyone is familiar with from a traditional Docker build. The second strategy, injecting code into a pre-built image, allows developers to add their code to a pre-built environment without really messing with the setup itself. And finally, Asset Generation Pipelines use containers to compile assets that are then included during a subsequent image build, eventually implemented natively by Docker as Multi-Stage Builds. With the introduction of Project Atomic’s new Buildah tool for creating container images, it has become easier to implement a new build strategy that exists as a hybrid of the other three: using development tools installed elsewhere to build or compile code directly into an image.

Segregating build dependencies from production images

Buildah makes it easy to “expose” a working container to the build system, allowing tools on the build system to modify the container‚Äôs filesystem directly. The container can then be committed to a container image suitable for use with Docker, Runc, etc. This keeps the build tools from being installed in the image, resutling in a smaller, leaner image.

Using the ever-helpful GNU Hello as an example, consider the following Dockerfile:

FROM fedora:25
LABEL maintainer Chris Collins <[email protected]> RUN dnf install -y tar gzip gcc make
RUN curl http://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xvz -C /opt
WORKDIR /opt/hello-2.10
RUN ./configure
RUN make
RUN make install
ENTRYPOINT "/usr/local/bin/hello"

This is a relatively straightforward Dockerfile. Hello needs gcc and make to compile, and the container needs tar and gzip to extract the source tarball containing the code. None of these packages are required for Hello to work once it has been built, though. Nor does Hello need any of the dependency packages installed alongside these four, binutils, cpp, gc, glibc-devel, glibc-header, guile, isl, kernel-headers, libatomic_ops, libgomp, libmpc, libstdc++, or libtool-ltdl, or updates to glibc glibc-common, glibc-langpack-en, libcrypt-nss or libgcc. These packages add an extra 48M of data to the resulting image that isn’t needed to run GNU Hello. The extracted source files for Hello itself are another 3.7M.

With Buildah, an image can be built without any extra packages or source files making it into the final image.

#!/usr/bin/env bash
set -o errexit

# Create a container
container=$(buildah from fedora:25)

# Mount the container filesystem
mountpoint=$(buildah mount $container)

# A Buildah-native command to set the maintainer label
buildah config --label maintainer="Chris Collins <[email protected]>" $container

# Download & extract the source files to the host machine
curl http://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz | tar xvz -C /tmp
pushd /tmp/hello-2.10

# Compile the code using make, gcc and their
# dependencies installed on the host machine
./configure
make

# Install Hello into the filesystem of the container
make install DESTDIR=${mountpoint}

popd

# Test that Hello works from inside the container filesystem
chroot $mountpoint bash -c "/usr/local/bin/hello -v"

# Set the entrypoint
buildah config --entrypoint "/usr/local/bin/hello" $container

# Save the container to an image
buildah commit --format docker $container hello

# Cleanup
buildah unmount $container
buildah rm $container

After using Buildah to create a container and mount its filesystem, the source files are extracted to the host. ¬†Hello is compiled using development packages from the host, and then make install DESTDIR=${mountpoint} installs the resulting compiled software to the container’s filesystem. Hello can be run to validate that it works from within the container by using chroot to change to the root of the container before running.

In addition to basic shell commands, a couple of Buildah commands are used to add container-specific information to the working container: ‚Äč‚Äč‚Äčbuildah config --label¬†is used to add the “maintainer” label, and buildah config --entrypoint sets the entrypoint.

Finally buildah commit --format docker saves the container to a Docker compatible container image.

This is a simple example, but it gets the general idea across. Of course some software has not only build dependencies, but runtime dependencies, as well. For those use cases, packages can be installed directly into the container’s filesystem with the host’s package manager. For example: dnf install -y --installroot=${mountpoint}.

Drawbacks to this method

Building images this way has some drawbacks, though. By removing the development tools from the image, the compilation of the software is no longer entirely contained in the image itself. The constant refrain of the container evangelists – “Build and Run Anywhere!” – is no longer true.1 When the devel tools are moved to the host, obviously, they must exist on the host. A stock Atomic host has no *-devel packages, so using the method above to build images that require these packages is not practical.2 The container images are no longer reliably reproducible.

A whole new world … er … container

These problems can be solved by using another container to build the image. Rather than installing development tools – or even Buildah – on the host, they can be built into a “builder” image that’s tailored to the type of image being created. For example, a builder image with make, gcc, and any other dependencies can be created to compile GNU Hello. Another image could include php and composer to compile assets for a PHP-based project. A Ruby builder image can be used for Ruby-on-Rails projects. This makes the build environment both portable and reproducible. Any project can contain not only its source code, but also code to create its build environment and production image.

Continuing with the GNU Hello example, a container image with Buildah, make, gcc, gzip, and tar pre-installed can be run, mounting the host’s /var/lib/containers directory and the buildah script from above:

docker run --volume /var/lib/containers:/var/lib/containers:z \
--volume $(pwd)/buildah-hello.sh:/build/buildah-hello.sh:z \
--rm \
--interactive \
--privileged \
--tty buildah-hello:latest /build/buildah-hello.sh

But there’s a catch, at least for now. As of August 2017, using Buildah in the container but not on the host creates an image that is difficult to interact with. The image is not available to the Docker daemon by default, because it’s in /var/lib/containers. Additionally, Buildah itself doesn’t yet support pushing to private registries that require authentication, so it’s challenging to get the image out of the container.

Skopeo, Buildah’s sister tool for moving images around, would be ideal for this. Afterall, that’s the Skopeo project’s …ahem… scope. Unfortunately, Buildah has a known issue that prevents Skopeo from pushing Buildah images to other locations, despite the fact that Skopeo can read and inspect the images.

There are some possible workarounds for now, though. First, if Buildah is installed on the host system, it will be able to read from /var/lib/containers (mounted into the container in the example above, allowing the resulting image to persist on the host), and the buildah push command from the host can copy the image to a local Docker daemon’s storage:

buildah push IMAGE:TAG docker-daemon:IMAGE:TAG

Optionally, if Docker is installed on the host system and in the build container, the host’s Docker socket can be mounted into the container, allowing Buildah to push to the host’s Docker daemon storage.

Buildah builds three ways

So, Buildah can be used to interact directly with the container using tools on the host system, but Buildah also supports other ways of building images. Using buildah bud or ‚Äúbuild-using-dockerfile‚ÄĚ, an image can be created as simply as using docker build. This method does not have the benefit of segregating development tools from the resulting production image; it’s doing the same exact things Docker would do. On the other hand, Buildah does not create and save intermediate images for each step, so builds are slightly to significantly faster using buildah bud over docker build (depending on the number of external blockers, ie: checking yum mirrors, waiting for code to compile, etc).

Buildah also has its own native commands for interacting with a container, such as buildah run, buildah add, and buildah copy, each generally equivalent to their Docker counterparts. In the examples above, buildah config has been used to set container settings such as labels and the entrypoint. These native commands make it easy to build containers without a Dockerfile, using whatever tool works best for the job – bash, make, etc – but without the full complexity of modifying the container filesystem directly as in the examples above.

Buildah FTW

Buildah is a solid alternative to Docker for building container images, and, as shown, makes it easy to create a container image that includes only the code and packages needed for production. The resulting images are smaller, builds are quicker, and there is less surface area for attack should the container be compromised.3

Using Buildah inside a container with development tools installed adds another layer of portability, allowing images to be built on any host with Runc, and optionally Docker, installed. With this model, the “build anywhere” model of the Dockerfile is maintained while still segregating all the build tools from the resulting image.

Overall, Buildah is a great new way to build container images, and makes it easy to build images faster and leaner. With its build-from-dockerfile support, Buildah makes it easy to be a drop-in replacement for the Docker deamon in build pipelines, and makes gradual migration to more sophisticated build practices less painless.


1: It’s not entirely true anyway, but by removing the build itself from inside the image, now it’s REALLY not true.

2: For the Atomic host example, you could take advantage of package layering to install the tools you need.

3: For whatever that buys you. It’s arguable that not including tools like make or gcc, etc, just adds a hurdle for an attacker but doesn’t actively make it any safer per se.

Header Image: By Pelf at en.wikipedia – Originally from en.wikipedia; description page is/was here., Public Domain, https://commons.wikimedia.org/w/index.php?curid=2747463

Ansible Role for RHEL Atomic Host

Ansible Role for RHEL Atomic Host

This morning I was asked by a friend if I could share any Ansible roles we use at $WORK for our Red Hat Atomic Host servers. ¬†It was a relatively easy task to review and sanitize our configs – Atomic Hosts are so minimal, there’s almost nothing we have to do to configure them.

When our Atomic hosts are initially created, they’re minimally configured via cloud-init to setup networking and add a root user ssh key. ¬†(We have a VMWare environment, so we use the RHEL Atomic .ova provided by Red Hat, and mount an ISO with the cloud-init ‘user-data’ and ‘metadata’ files to be read by cloud-init.). ¬†Once that’s done, we run Ansible tasks from a central server to setup the rest of the Atomic host.

Below is a snippit of most of the playbook.

I think the variables are self-explanatory. ¬† Some notes are added to explain why we’re doing a particular thing. ¬†The disk partitioning is explained in more detail in a previous post of mine.

---
  # Set w/Ansible because cloud-init is plain text
  - name: Access | set root password
    user: 
      name: root
      password: "{{ root_password }}"

  - name: Access | add ssh user keys
    authorized_key:
      user: "{{ item.name }}"
      key: "{{ item.key }}"
    with_items: "{{ ssh_users }}"

  - name: Access | root access to cron
    lineinfile:
      dest: /etc/security/access.conf
      line: "+:root:cron crond"

  - name: Access | fail closed
    lineinfile:
      dest: /etc/security/access.conf
      line: "-:ALL:ALL"

  # docker-storage-setup service re-configures LVM
  # EVERY TIME Docker service starts, eventually
  # filling up disk with millions of tiny files
  - name: Disks | disable lvm archives
    copy:
      src: lvm.conf
      dest: /etc/lvm/lvm.conf
    notify:
      - restart lvm2-lvmetad

  - name: Disks | expand vg with extra disks
    lvg:
      vg: '{{ volume_group }}'
      pvs: '{{ default_pvs }}'

  - name: Disks | expand the lvm
    lvol:
      vg: '{{ volume_group }}'
      lv: '{{ root_lv }}'
      size: 15g

  - name: Disks | grow fs for root
    filesystem:
      fstype: xfs
      dev: '{{ root_device }}'
      resizefs: yes

  - name: Disks | create srv lvm
    lvol:
      vg: '{{ volume_group }}'
      lv: '{{ srv_lv }}'
      size: 15g

  - name: Disks | format fs for srv
    filesystem:
      fstype: xfs
      dev: '{{ srv_device }}'
      resizefs: no

  - name: Disks | mount srv
    mount:
      name: '{{ srv_partition }}'
      src: '{{ srv_device }}'
      fstype: xfs
      state: mounted
      opts: 'defaults'

  ## This is a workaround for XFS bug (only grows if mounted)
  - name: Disks | grow fs for srv
    filesystem:
      fstype: xfs
      dev: '{{ srv_device }}'
      resizefs: yes

  ## Always check this, or it will try to do it each time
  - name: Disks | check if swap exists
    stat:
      path: '{{ swapfile }}'
      get_checksum: no
      get_md5: no
    register: swap

  - debug: var=swap.stat.exists

  - name: Disks | create swap lvm
   ## Shrink not supported until 2.2
   #lvol: vg=atomicos lv=swap size=2g shink=no
    lvol:
      vg: atomicos
      lv: swap
      size: 2g

  - name: Disks |make swap file
    command: mkswap '{{ swapfile }}'
    when:
      - swap.stat.exists == false

  - name: Disks | add swap to fstab
    lineinfile:
      dest: /etc/fstab
      regexp: "^{{ swapfile }}"
      line: "{{ swapfile }}  none    swap    sw    0   0"

  - name: Disks | swapon
    command: swapon '{{ swapfile}}'
    when: ansible_swaptotal_mb < 1   

  - name: Docker | setup docker-storage-setup     
    lineinfile:       
      dest: /etc/sysconfig/docker-storage-setup
      regexp: ^ROOT_SIZE=
      line: "ROOT_SIZE=15G"
      register: docker-storage-setup

  - name: Docker | setup docker-network
    lineinfile: 
      dest: /etc/sysconfig/docker-network 
      regexp: ^DOCKER_NETWORK_OPTIONS= 
      line: >
        'DOCKER_NETWORK_OPTIONS=-H unix:///var/run/docker.sock 
        -H tcp://0.0.0.0:2376 
        --tlsverify 
        --tlscacert=/etc/pki/tls/certs/ca.crt 
        --tlscert=/etc/pki/tls/certs/host.crt 
        --tlskey=/etc/pki/tls/private/host.key'

  - name: add CA certificate
    copy: 
      src: ca.crt 
      dest: /etc/pki/tls/certs/ca.crt 
      owner: root 
      group: root 
      mode: 0644

  - name: Admin Helpers | thinpool wiper script
    copy:
      src: wipe_docker_thinpool.sh
      dest: /usr/local/bin/wipe_docker_thinpool.sh
      mode: 0755

  - name: Journalctl | set journal sizes
    copy:
      src: journald.conf
      dest: /etc/systemd/journald.conf
      mode: 0644
    notify:
      - restart systemd-journald

  - name: Random Atomic Bugfixes | add lastlog
    file:
      path: /var/log/lastlog
      state: touch

  - name: Random Atomic Bugfixes | add root bashrc for prompt
    copy:
      src: root-bashrc
      dest: /root/.bashrc
      mode: 0644

  - name: Random Atomic Bugfixes | add root bash_profile for .bashrc
    copy:
      src: root-bash_profile
      dest: /root/.bash_profile
      mode: 0644

  ### Disable Cloud Init ###
  ## These are in Ansible 2.2, which we don't have yet
  - name: stop cloud-config
    systemd: name=cloud-config state=stopped enabled=no masked=yes
    ignore_errors: yes

  - name: stop cloud-init
    systemd: name=cloud-init state=stopped enabled=no masked=yes
    ignore_errors: yes

  - name: stop cloud-init-local
    systemd: name=cloud-init-local state=stopped enabled=no masked=yes
    ignore_errors: yes

  - name: stop cloud-final
    systemd: name=cloud-final state=stopped enabled=no masked=yes
    ignore_errors: yes

  - name: Find old cloud-init files if they exist
    shell: rm -f /etc/init/cloud-*
    ignore_errors: yes

The only other tasks we run are related to $WORK specific stuff (security office scanning, patching user account for automated updates, etc).

One of the beneficial side-effects of mixing cloud-init and Ansible is that the cloud-init is only used for the initial setup (networking and root access), so it ends up being under the size limit imposed by Amazon Web Services on their user-data files.  This allows us to create and maintain RHEL Atomic hosts in AWS using the exact same cloud-init user-data file and Ansible roles.