Kallistec

January 23, 2010

The Chef Way

Filed under: Chef — Daniel DeLeo @ 11:22 am
Tags: , , , , , ,

In the beginning

A few years ago, I discovered the ruby language. After just a few hours using it, I became so enamored that I decided to use it for all of the little systems administration automation tasks that have traditionally been the domain of perl and shell scripts. As my journey into ruby continued, I discovered that many rubyists are also big proponents of the programming methodologies collectively termed “agile.” Reading the agile manifesto, I had two reactions: firstly, it made a lot of sense; secondly, I wondered how, if embracing change is the norm, does one reconcile this with the inherent difficulties in adapting operations to a changing environment? Much to my dismay, googling for “agile systems administration” gave me a single result, a blog post whose author was asking those same questions without providing the answers. That has all changed: now we have pervasive virtualization, infrastructure as a service, and many high profile proponents of an agile approach to operations, some retaining the term agile, others labeling the movement “devops” (I think “opsdev” sounds better, but you can’t win ‘em all). The changes in this approach to operations are far-reaching, with organizational and even philosophical aspects, accompanied by new (or just newly discovered) tools to match the new understanding.

Arguably the most important shift in tooling is the embrace of configuration management and automation tools. Enabled by the greatly reduced lead times required to provision new systems, configuration management is key to building the kind of dynamic, change-responsive infrastructures I began dreaming about while reading the agile manifesto all those years back. Although the history of configuration management may stretch back to the original Bourne or even Thompson shells, and hit a major milestone in 1993 with the first release of Cfengine, both the variety and use of these tools has risen dramatically.

The Chef Way

The Master observes the world
but trusts his inner vision.

Given the goal of “Operations Zen,” not fighting change, but using it to our advantage, embracing a duality of operations and development, and, yes, being productive and doing more with less, how should we achieve it? I contend that the tool to use is Chef, and the path to follow is the Chef way. This is the beginning of a series of posts covering Chef’s design decisions, its terminology, and how to use Chef. Once we get to using Chef, I’ll highlight how doing things the Chef way helps us build agile, fully automated infrastructures, but in this post I’ll describe what the Chef way means in terms of Chef’s design.

So, what is the Chef way? For starters, flexibility. Opinionated software is great, but if it’s so opinionated that it makes your job harder… that’s too opinionated. So Chef tries to be opinionated where it simplifies, and let you call the shots where it matters.

An Internal Ruby DSL.

The more prohibitions you have,
the less virtuous people will be

Chef’s configuration language is implemented within Ruby, instead of outside of it. This is probably the most defining characteristic of Chef, and maybe also the most controversial. Instead of an application that the user merely configures, by using ruby as the configuration language, Chef is both a tool and a framework. Some have described Chef as “half of an application” for this reason. But what does this actually mean in practice? Can you use Chef if you’re not a “1337″ Merb hacker? Well, yes and no, or “mu!”

Chef provides all of the basic building blocks one would expect from a configuration managment application: one can install packages, write templated configuration files, start and stop services, manage users, and generally automate all of the common administration tasks completely within the pre-defined DSL. If you’re learning Chef with no prior knowledge of ruby, this may be a bit more difficult than learning a simpler language specific to a configuration management tool, but not by much. However, in the process of learning the Chef DSL, something else has happened: the user has learned basic ruby syntax, and taken a step into a world much wider than configuration management. An important benefit of this is that it makes it easy to modify the workings of Chef, from small tweaks and additions, to much more drastic changes, even from within Chef recipes. There’s no cognitive barrier of using an external DSL and then needing to learn or use a different language to extend Chef.

This leads directly to the great advantage Chef gains by using a pure ruby DSL: as Bryan McLellan wrote recently, Chef embraces a dual nature of being both a tool and a mere library, and fills the continuum in between. Some of the advantages are very simple: we can use ruby’s array literal syntax to conveniently express many similar actions. Others are more complex: we can use a SQL library to configure systems based on information in a database. Using ruby as the configuration language allows us the expressiveness to describe our systems as they actually exist within the entire infrastructure and gives us access to a large collection of libraries to integrate with that infrastructure. Using Chef, we are not limited to seeing our systems as isolated pieces, we may also consider the whole.

Determinism

Though Chef provides great flexibility in what we may ask it to do, it enforces a deterministic order in which operations occur. People who still remember manual systems administration (what?) will find this a familiar concept: when you write or follow documentation, a how-to for example, you never see (or write) anything like, “do steps 1a, 1b, and 1c in some random order, then do steps 2a, 2b, 2c in a random order, then…” The idea that order matters is not original to Chef, and is also a matter of some controversy among the infrastructure management community. Expressed simply, order matters because each step our configuration management tool takes to deliver a fully configured system is taken within the context of the state of the entire system, and there can be complex, non-obvious dependencies between these actions. Steve Traugott and Lance Brown explored the benefits of deterministic order in depth in their 2002 paper, “Why Order Matters: Turing Equivalence in Automated Systems Administration” and concluded, “it appears that no tool, written in any language, can predictably administer an enterprise infrastructure without maintaining a deterministic, repeatable order of changes on each host.” Among many salient points, Traugott and Brown refer to a 2002 study showing that installing RPMs with no declared dependencies, and with installation scripts disabled, in different orders can lead to different outcomes. Given the overwhelming complexity of mapping these hidden dependencies, it is not surprising that the map rarely matches the territory, or that Chef’s answer is not to build a better map.

Chef’s method for ordering dependent operations is simple and elegant: it runs actions in the order written. To install an application and then configure it, you simply declare that the package should be installed at the top of the file, and declare its configuration at the bottom. Of course, a system’s configuration can be split up between many files, with dependencies across files; in Chef, these dependencies are evaluated according to simple, deterministic rules so that it’s easy to know exactly what will happen and in what order. And once you’ve built one system and tested it, you can be confident that every other system you build will be built in exactly the same way.

Consistency

If you don’t realize the source,
you stumble in confusion and sorrow.

This one is related to determinism: Chef tries to manage systems so that they can always be rebuilt from scratch. To illustrate this point, consider the way Chef manages dynamic, templated files. Chef always writes the complete file instead of inserting text willy-nilly in some arbitrary spot. If you make a change to the template or data used to generate the file, Chef rewrites the entire file. This encourages consistency because there is no way the file can have content that was inserted by Chef at one time but later removed from Chef’s management. As a result, you get the same configuration files no matter the history of the server being managed.

Declarative and Idempotent

Chef is declarative: when working with Chef, you specify the end result, and let Chef worry about how it’s achieved. In English, you can think of every action in Chef having the word “shall” attached to it. “The system shall have this package installed and shall have this configuration file.” Chef is also idempotent: no matter how many times you run it, it leaves the system in the same state. This has a variety of benefits. For one, you now have executable documentation of how your boxes are configured. Want to know how a system was built? Want to build a new one? You can get the answer to both in the same place. When you need to make a change, upgrading a certain package, for example, you make that change in one place. There’s no drift between what’s actually on the system and what the documentation says, they’re one and the same.

Another important aspect of automating infrastructures over time is staying automated. With shell scripts designed to be run once, changing a few variables and re-running the scripts on a live system is probably not going to end well. Back to our upgrading-a-package example, when we make the change with Chef, Chef will see that we have a different version of the package on the system than specified and install the new version, but take no action where the system matches what’s been specified. This means that systems are built and maintained in exactly the same way. As a result, systems that are supposed to be identical are identical, whether they were built last week or last year, and building a new one to match follows the exact same process. And what happens when things go wrong? With run-once-only scripts, you might hunt through the script to figure out which line it was executing when it failed, comment out everything that came before it in the script and try again, or maybe finish what the script was trying to do manually. Yuk. With Chef, assuming the error was transient, you can just run it again. Everything that it finished before the Chef run failed will match the specification and be skipped. Much better.

Fail Fast

If you want to be reborn,
let yourself die.

When something goes wrong configuring a host, Chef fails immediately. From one point of view, there’s nothing else Chef could do, since many dependencies are only expressed implicitly. In the example where we install a package and then configure it, the configuration step clearly depends on the installation step, but Chef doesn’t track this dependency at all (unless we specifically ask it to), so it wouldn’t know to cancel configuration if the package install failed. If we step back a bit and look at the whole view of what Chef is trying to accomplish, however, we see that this isn’t really a problem. After all, we’re asking Chef to deliver us a fully configured webserver, load balancer, coffee machine, whatever. If it works fine with half of its configuration missing, we probably didn’t need that half of the configuration in the first place. So it turns out that the best thing Chef can do in case of failure is to fail. If the failure was transient, say, a network issue, you can run Chef again and you’ll get the fully configured server. If the failure was caused by something more significant, you’d have to fix it before you could get the fully configured server you wanted, whether or not Chef ignored the failure. By failing fast, Chef alerts us to problems immediately, without losing any important functionality.

Configuration In Context

Another unique feature of Chef is that it allows individual hosts to query for information about the rest of your infrastructure—Chef supports this out of the box. This enables truly dynamic infrastructures and keeps our configurations DRY. The advantages of this feature are most apparent when configuring applications such as load balancers and monitoring systems, where the configuration depends on what other infrastructure is available. A load balancer, for example, needs to know what backends are available to balance the load across. If Chef didn’t have the ability to query for information, then for each backend we add or remove, we’d have to also update the load balancer’s configuration. Instead, all we have to do is ask for a list of the backends, and add each one to the load balancer’s configuration programmatically. When a new backend is added or removed, the next time Chef runs on the load balancer host, it will automatically generate a new configuration file, keeping the load balancer’s view of the available backends up to date.

Fat Client, Skinny Server

If you’ve ever used monitoring system software, you’re familiar with this debate: does the client or the server do all of the work? With Chef, the clients do the work. It’s a bit inaccurate to call the server “skinny:” especially with the upcoming 0.8.0 release, the server has a fair amount of complexity, mostly in support of making its searchable infrastructure features rock-solid and efficient. Where “fat client, skinny server” comes in to play is that Chef runs all of the code and template rendering required to configure a system on the client. This design allows Chef to scale down: Chef has a lightweight, serverless, “solo” version of the client, suitable for running on a single host (as an aside, chef-solo is a great way to get started with Chef—that way you can tackle Chef’s features in chunks instead of having to learn all of the parts at once). With the server doing less, Chef can also scale up diagonally like a modern web application, and handle more clients-per-server than it could if the server shouldered more of the burden.

“Skinny server” also means that Chef never runs outside code on the server. Running a centralized configuration management system does entail placing trust in the central server not to hand your entire infrastructure over to the baddies. Running external code safely is one of the most complicated problems in all of computing, and the benefits of doing so in Chef are negligible compared to the risks. Pushing computation to the clients buys us scalability and gives us some extra security for free.

APIs: you build the other half of this.

If you want to become whole,
let yourself be partial.

We touched on this point briefly while discussing Chef’s pure ruby DSL, but throughout its design, Chef strives to make external tools first class citizens. Any information stored by Chef can be accessed via an HTTP API. If you can imagine a new use for Chef’s data or a new way to interact with it, you can build it.

Pragmatic Abstraction

The mark of a moderate man
is freedom from his own ideas.

Chef uses abstractions where they are helpful, but Chef never pretends that abstractions are perfect. Usually, we’d like to simply say “install the MySQL package” and have it work on every platform and package manager. Unfortunately, different distros and OSs disagree on what packages should be named. So Chef lets us break the abstraction and specify a package name for each platform. Another common issue with abstractions is that they commonly force us to use only the lowest common denominator subset of what each underlying implementation is capable of. Again, Chef lets us choose: we can use the cross-platform subset of features when that’s good enough, but we also have access to fully-featured, platform-specific implementations when we need them.

Chef also embraces the idea that when abstractions are useful, we can often gain even more productivity by layering another abstraction on top of it. Taking an example from outside Chef, TCP and IP abstract away all of the vagaries of wires and wire-level protocols, allowing us to focus on sending data to a remote system. So what do we do with it? Layer other abstractions such as SMTP and HTTP on top, so we can forget about sending data and think about sending mail or getting text documents. Likewise, with Chef, we can build our own abstraction layers on top of the ones Chef provides, and we can build programmatically build up abstractions using data from anywhere. A simple example of this is taking a list of package names and building a list of package objects from it. This way, we can minimize repetition (stay DRY!), express our infrastructures succinctly, integrate Chef with any data source we need, and maximize automation.

Wrapping Up

In this post, we’ve covered the Chef why, seeing that Chef’s design focuses on flexibility, repeatability, and pragmatism. In later posts in the series, we’ll learn to speak like a Chef, and how to use Chef to build infrastructures that are both agile and robust.

More in This Series

References

Blog at WordPress.com.