Moved

Moved

It's been a long while since I've posted here. For a long time I didn't feel like I had anything interesting to say, but recently I've had the itch to write up a few things I've been working on and link to some interesting projects I've come across. But I couldn't bear to keep using wordpress any longer—wordpress (and wordpress.com) are actually pretty great at what they do, but I'm the kind of person who writes blog posts in markdown in a text editor. In addition, I was already running a machine in the Rackspace cloud for ZNC, and I didn't feel like paying wordpress to customize my theme or to remove ads. So I switched to Jekyll which just feels like a much better blogging platform for someone like me.

About the Design

I'm not a designer by trade, so I opted for a minimal approach with the design. Luckily, minimal designs are pretty good for reading text, which is presumably what you're here to do. There's a whole crop of software these days designed to remove the busy features (or malfeatures) of a page and let you focus on the text and traditionally reading-focused apps, such as RSS readers, have been getting the minimal treatment as well. Reeder, instapaper, readability, and Safari's "reader" feature are good examples of the "just the text" aesthetic that inspired the design.

To actually put the site together, I started with the HTM5 Boilerplate, and removed the stuff I didn't need. A nice benefit of doing it this way is that the site looks pretty good in mobile safari automatically. Unless you do front-end HTML work all the time, I'd recommend using the commented version of the boilerplate—there's a ton of links to interesting articles about web page optimization, cross-browser compatibility, and the like in the comments.

Under Construction

Of course, the downside of switching from a relatively heavyweight blogging platform to something minimal like Jekyll is that you lose a lot of features you used to get for free. Most of them—tags, categories, etc.—I can live without, but there's a few basics, such as having a list of all of the articles, that I really should have. So I'll keep evolving the design as I go along. The great thing about having a blogging platform I can iterate on is that any mediocre aspect of the site is just an afternoon's worth of work away from being great.


The Chef Way Episode 2: Chef Speak

Chef Speak

Express yourself completely, then keep quiet.

Perhaps the biggest hurdle to overcome when learning Chef, especially for those new to configuration management software, is the terminology. Learning the language is an absolute necessity for understanding Chef and being able to explain problems when they arise. The difficulty in learning Chef's language, I think, has two root causes: firstly, Chef's abstractions of system configuration concepts appear novel to users unfamiliar with them—users might be learning the abstractions with little frame of reference; and secondly, it's hard to find authoritative definitions of what the terms mean in Chef. This makes it more difficult to learn Chef by reading the wiki, since you won't know what to look at until you know what the words mean. Luckily, Chef's concepts are pretty intuitive, so learning to speak like a chef won't be difficult. Here we go:

Resource

A resource is usually a cross platform abstraction of the thing you're configuring on the host. For example, packages may be installed via apt, yum, or the BSD ports and packages systems, but the package resource abstracts these differences away so you can specify that a package should be installed in a cross-platform way. Chef's resources are mostly just containers for data, with some basic validation functionality.

Attribute (On a Resource)

As I just noted, resources are mostly containers for data. Attributes are the pieces of data that resources contain. In the case of managing a package, this might be the name of the package you want to install, the version you want to install, or options to pass to the package manager.

NOTE: The Chef developers have discussed renaming these "properties" or something similar, to avoid overloading the term "attribute," which is also used to describe data associated with nodes and roles, described below. For now, though, the documentation says "attribute," so that's what I'm using here.

Action

The action is what you want Chef to do with the resource: should the package be installed? Upgraded to the latest version? Removed? Actions are usually specific to the resource, but all resources support the nothing action, which does what its name suggests.

Provider

The provider is the specific implementation of what the resource abstracts. On Red Hat or CentOS, a package resource will use the yum package provider to get the package installed, but on Debian and Ubuntu, the apt package provider will be used. Providers contain most of the intelligence: they're responsible for making Chef idempotent by checking if an action needs to be taken and issuing the commands to the system to take that action. In the case of package providers, they first check if the desired version of a package is installed and run the yum, apt-get, or or other package manager commands to install or upgrade as needed. When working with Chef, you normally don't need to worry about providers. For the occasions when you do, Chef provides "shortcut" resources which will always use the desired provider. The dpkg_package and rpm_package resources allow you to install packages directly from the filesystem, using providers specific to these package managers, for example.

Templates

Templates, as one would expect, are files with all of the important data substituted with code so we can fill them in later. You can use templates to create any kind of file you like, though the most common use is to create configuration files with host-specific parameters filled in dynamically. Using templates involves a template resource, which Chef backs with a template provider, and the template file itself.

Chef templates use ERb (Embedded Ruby) syntax, though Chef uses the erubis implementation of ERb for speed.

Recipe

Recipes are the files where you write your resources. Recipes can also contain arbitrary ruby code, but you need to understand a little bit about how Chef runs to make productive use of this. Each Chef run is a two stage process: in the first step, usually called the compilation step, Chef evaluates the recipe files, building up a list of the resources. In the next step, Chef executes the desired action for each resource on the provider for that resource. Any arbitrary code in a recipe will be run during the compilation step, not the execution step. To defer execution until the execution phase of the Chef run, use the ruby_block resource.

Node

A node is a host that runs Chef. The primary features of a node, from Chef's point of view, are its attributes and its run list. In the distant future, Chef may support a "virtual node" concept, where the Chef client runs on one host but configures another, such as a router or switch that can't run ruby. For now, though, the Chef client is assumed to be running on the node it configures. This means that every host you want to manage with chef will need ruby and the Chef client installed.

Role

A role provides a means of grouping similar features of similar nodes. At web scale, you almost never have just one of something, so you can use roles to express the parts of the configuration that are shared by a group of nodes. Roles consist of the same parts as a node: attributes and a run list. When the Chef client runs, it merges its own attributes and run list with those of any roles it has been assigned.

Attribute (On a Node or Role)

Nodes and roles have associated attributes, which are a structure of nested key–value pairs. Node and role attributes are commonly used as inputs for resource attributes. For example, your production boxes might be using one version of nginx, but you'd like to install a newer version on your staging servers for testing. By using a node or role attribute to specify the version, you can use the same recipe in both environments.

Chef allows attributes to be set in attribute files (among other myriad ways). Code in these files accesses the node chef is running on, and manages attributes on that node directly. In ruby terms, the value of self within attribute files is the node. Using attribute files, you can rely on a node having a sane value for an attribute when writing a recipe without having to worry that the node may not have defined that attribute.

Advanced Chef users often make heavy use of attributes defined in roles to manage attributes on many nodes at once.

Cookbook

A cookbook is a collection of the various related files, such as recipes, templates, and attributes files that chef uses to configure a system, plus metadata. Cookbooks are typically grouped around configuring a single package or service. The MySQL cookbook, for instance, contains recipes for both client and server, plus an attributes file to ensure the attributes used in the recipes are available if they haven't been defined on the node in some other way.

Metadata

Cookbooks often rely on other cookbooks for pre-requisite functionality. In order for the server to know which cookbooks to ship to a client, a cookbook that depends on another one needs to express that dependency somewhere. That "somewhere" is in its metadata. Dependency tracking is the most visible part of metadata, but metadata also can contain information about authorship, licensing, a description of the cookbook, what platforms the cookbook is expected to work on, and whether or not a cookbook plays nice with other cookbooks. At the moment, Chef supports many more fields for metadata than it actually uses, but maintaining accurate dependency information is absolutely essential, since nodes may not get all of the cookbooks they need if this information is incomplete.

Run List

In the simplest case, the run list is just an ordered list of the recipes that a node should run. Assuming the cookbook metadata is correct, you can put just the recipes you want to run in the run list, and dependent recipes will be run automatically as needed.

In the more complicated case, the run list will include roles that a node has been assigned along with any recipes set on the node explicitly. In this case, when the Chef client runs, the run list is "expanded" into a list of recipes by replacing the role's entry in the run list with the list of recipes the role specifies in its run list.

References


The Chef Way

In the beginning

A few years ago, I discovered the ruby language. After just a few hours using it, I became so enamored that I decided to use it for all of the little systems administration automation tasks that have traditionally been the domain of perl and shell scripts. As my journey into ruby continued, I discovered that many rubyists are also big proponents of the programming methodologies collectively termed "agile." Reading the agile manifesto, I had two reactions: firstly, it made a lot of sense; secondly, I wondered how, if embracing change is the norm, does one reconcile this with the inherent difficulties in adapting operations to a changing environment? Much to my dismay, googling for "agile systems administration" gave me a single result, a blog post whose author was asking those same questions without providing the answers. That has all changed: now we have pervasive virtualization, infrastructure as a service, and many high profile proponents of an agile approach to operations, some retaining the term agile, others labeling the movement "devops" (I think "opsdev" sounds better, but you can't win 'em all). The changes in this approach to operations are far-reaching, with organizational and even philosophical aspects, accompanied by new (or just newly discovered) tools to match the new understanding.

Arguably the most important shift in tooling is the embrace of configuration management and automation tools. Enabled by the greatly reduced lead times required to provision new systems, configuration management is key to building the kind of dynamic, change-responsive infrastructures I began dreaming about while reading the agile manifesto all those years back. Although the history of configuration management may stretch back to the original Bourne or even Thompson shells, and hit a major milestone in 1993 with the first release of Cfengine, both the variety and use of these tools has risen dramatically.

The Chef Way

The Master observes the world but trusts his inner vision.

Given the goal of "Operations Zen," not fighting change, but using it to our advantage, embracing a duality of operations and development, and, yes, being productive and doing more with less, how should we achieve it? I contend that the tool to use is Chef, and the path to follow is the Chef way. This is the beginning of a series of posts covering Chef's design decisions, its terminology, and how to use Chef. Once we get to using Chef, I'll highlight how doing things the Chef way helps us build agile, fully automated infrastructures, but in this post I'll describe what the Chef way means in terms of Chef's design.

So, what is the Chef way? For starters, flexibility. Opinionated software is great, but if it's so opinionated that it makes your job harder... that's too opinionated. So Chef tries to be opinionated where it simplifies, and let you call the shots where it matters.

An Internal Ruby DSL.

The more prohibitions you have, the less virtuous people will be

Chef's configuration language is implemented within Ruby, instead of outside of it. This is probably the most defining characteristic of Chef, and maybe also the most controversial. Instead of an application that the user merely configures, by using ruby as the configuration language, Chef is both a tool and a framework. Some have described Chef as "half of an application" for this reason. But what does this actually mean in practice? Can you use Chef if you're not a "1337" Merb hacker? Well, yes and no, or "mu!"

Chef provides all of the basic building blocks one would expect from a configuration managment application: one can install packages, write templated configuration files, start and stop services, manage users, and generally automate all of the common administration tasks completely within the pre-defined DSL. If you're learning Chef with no prior knowledge of ruby, this may be a bit more difficult than learning a simpler language specific to a configuration management tool, but not by much. However, in the process of learning the Chef DSL, something else has happened: the user has learned basic ruby syntax, and taken a step into a world much wider than configuration management. An important benefit of this is that it makes it easy to modify the workings of Chef, from small tweaks and additions, to much more drastic changes, even from within Chef recipes. There's no cognitive barrier of using an external DSL and then needing to learn or use a different language to extend Chef.

This leads directly to the great advantage Chef gains by using a pure ruby DSL: as Bryan McLellan wrote recently, Chef embraces a dual nature of being both a tool and a mere library, and fills the continuum in between. Some of the advantages are very simple: we can use ruby's array literal syntax to conveniently express many similar actions. Others are more complex: we can use a SQL library to configure systems based on information in a database. Using ruby as the configuration language allows us the expressiveness to describe our systems as they actually exist within the entire infrastructure and gives us access to a large collection of libraries to integrate with that infrastructure. Using Chef, we are not limited to seeing our systems as isolated pieces, we may also consider the whole.

Determinism

Though Chef provides great flexibility in what we may ask it to do, it enforces a deterministic order in which operations occur. People who still remember manual systems administration (what?) will find this a familiar concept: when you write or follow documentation, a how-to for example, you never see (or write) anything like, "do steps 1a, 1b, and 1c in some random order, then do steps 2a, 2b, 2c in a random order, then..." The idea that order matters is not original to Chef, and is also a matter of some controversy among the infrastructure management community. Expressed simply, order matters because each step our configuration management tool takes to deliver a fully configured system is taken within the context of the state of the entire system, and there can be complex, non-obvious dependencies between these actions. Steve Traugott and Lance Brown explored the benefits of deterministic order in depth in their 2002 paper, "Why Order Matters: Turing Equivalence in Automated Systems Administration" and concluded, "it appears that no tool, written in any language, can predictably administer an enterprise infrastructure without maintaining a deterministic, repeatable order of changes on each host." Among many salient points, Traugott and Brown refer to a 2002 study showing that installing RPMs with no declared dependencies, and with installation scripts disabled, in different orders can lead to different outcomes. Given the overwhelming complexity of mapping these hidden dependencies, it is not surprising that the map rarely matches the territory, or that Chef's answer is not to build a better map.

Chef's method for ordering dependent operations is simple and elegant: it runs actions in the order written. To install an application and then configure it, you simply declare that the package should be installed at the top of the file, and declare its configuration at the bottom. Of course, a system's configuration can be split up between many files, with dependencies across files; in Chef, these dependencies are evaluated according to simple, deterministic rules so that it's easy to know exactly what will happen and in what order. And once you've built one system and tested it, you can be confident that every other system you build will be built in exactly the same way.

Consistency

If you don't realize the source, you stumble in confusion and sorrow.

This one is related to determinism: Chef tries to manage systems so that they can always be rebuilt from scratch. To illustrate this point, consider the way Chef manages dynamic, templated files. Chef always writes the complete file instead of inserting text willy-nilly in some arbitrary spot. If you make a change to the template or data used to generate the file, Chef rewrites the entire file. This encourages consistency because there is no way the file can have content that was inserted by Chef at one time but later removed from Chef's management. As a result, you get the same configuration files no matter the history of the server being managed.

Declarative and Idempotent

Chef is declarative: when working with Chef, you specify the end result, and let Chef worry about how it's achieved. In English, you can think of every action in Chef having the word "shall" attached to it. "The system shall have this package installed and shall have this configuration file." Chef is also idempotent: no matter how many times you run it, it leaves the system in the same state. This has a variety of benefits. For one, you now have executable documentation of how your boxes are configured. Want to know how a system was built? Want to build a new one? You can get the answer to both in the same place. When you need to make a change, upgrading a certain package, for example, you make that change in one place. There's no drift between what's actually on the system and what the documentation says, they're one and the same.

Another important aspect of automating infrastructures over time is staying automated. With shell scripts designed to be run once, changing a few variables and re-running the scripts on a live system is probably not going to end well. Back to our upgrading-a-package example, when we make the change with Chef, Chef will see that we have a different version of the package on the system than specified and install the new version, but take no action where the system matches what's been specified. This means that systems are built and maintained in exactly the same way. As a result, systems that are supposed to be identical are identical, whether they were built last week or last year, and building a new one to match follows the exact same process. And what happens when things go wrong? With run-once-only scripts, you might hunt through the script to figure out which line it was executing when it failed, comment out everything that came before it in the script and try again, or maybe finish what the script was trying to do manually. Yuk. With Chef, assuming the error was transient, you can just run it again. Everything that it finished before the Chef run failed will match the specification and be skipped. Much better.

Fail Fast

If you want to be reborn, let yourself die.

When something goes wrong configuring a host, Chef fails immediately. From one point of view, there's nothing else Chef could do, since many dependencies are only expressed implicitly. In the example where we install a package and then configure it, the configuration step clearly depends on the installation step, but Chef doesn't track this dependency at all (unless we specifically ask it to), so it wouldn't know to cancel configuration if the package install failed. If we step back a bit and look at the whole view of what Chef is trying to accomplish, however, we see that this isn't really a problem. After all, we're asking Chef to deliver us a fully configured webserver, load balancer, coffee machine, whatever. If it works fine with half of its configuration missing, we probably didn't need that half of the configuration in the first place. So it turns out that the best thing Chef can do in case of failure is to fail. If the failure was transient, say, a network issue, you can run Chef again and you'll get the fully configured server. If the failure was caused by something more significant, you'd have to fix it before you could get the fully configured server you wanted, whether or not Chef ignored the failure. By failing fast, Chef alerts us to problems immediately, without losing any important functionality.

Configuration In Context

Another unique feature of Chef is that it allows individual hosts to query for information about the rest of your infrastructure—Chef supports this out of the box. This enables truly dynamic infrastructures and keeps our configurations DRY. The advantages of this feature are most apparent when configuring applications such as load balancers and monitoring systems, where the configuration depends on what other infrastructure is available. A load balancer, for example, needs to know what backends are available to balance the load across. If Chef didn't have the ability to query for information, then for each backend we add or remove, we'd have to also update the load balancer's configuration. Instead, all we have to do is ask for a list of the backends, and add each one to the load balancer's configuration programmatically. When a new backend is added or removed, the next time Chef runs on the load balancer host, it will automatically generate a new configuration file, keeping the load balancer's view of the available backends up to date.

Fat Client, Skinny Server

If you've ever used monitoring system software, you're familiar with this debate: does the client or the server do all of the work? With Chef, the clients do the work. It's a bit inaccurate to call the server "skinny:" especially with the upcoming 0.8.0 release, the server has a fair amount of complexity, mostly in support of making its searchable infrastructure features rock-solid and efficient. Where "fat client, skinny server" comes in to play is that Chef runs all of the code and template rendering required to configure a system on the client. This design allows Chef to scale down: Chef has a lightweight, serverless, "solo" version of the client, suitable for running on a single host (as an aside, chef-solo is a great way to get started with Chef—that way you can tackle Chef's features in chunks instead of having to learn all of the parts at once). With the server doing less, Chef can also scale up diagonally like a modern web application, and handle more clients-per-server than it could if the server shouldered more of the burden.

"Skinny server" also means that Chef never runs outside code on the server. Running a centralized configuration management system does entail placing trust in the central server not to hand your entire infrastructure over to the baddies. Running external code safely is one of the most complicated problems in all of computing, and the benefits of doing so in Chef are negligible compared to the risks. Pushing computation to the clients buys us scalability and gives us some extra security for free.

APIs: you build the other half of this.

If you want to become whole, let yourself be partial.

We touched on this point briefly while discussing Chef's pure ruby DSL, but throughout its design, Chef strives to make external tools first class citizens. Any information stored by Chef can be accessed via an HTTP API. If you can imagine a new use for Chef's data or a new way to interact with it, you can build it.

Pragmatic Abstraction

The mark of a moderate man is freedom from his own ideas.

Chef uses abstractions where they are helpful, but Chef never pretends that abstractions are perfect. Usually, we'd like to simply say "install the MySQL package" and have it work on every platform and package manager. Unfortunately, different distros and OSs disagree on what packages should be named. So Chef lets us break the abstraction and specify a package name for each platform. Another common issue with abstractions is that they commonly force us to use only the lowest common denominator subset of what each underlying implementation is capable of. Again, Chef lets us choose: we can use the cross-platform subset of features when that's good enough, but we also have access to fully-featured, platform-specific implementations when we need them.

Chef also embraces the idea that when abstractions are useful, we can often gain even more productivity by layering another abstraction on top of it. Taking an example from outside Chef, TCP and IP abstract away all of the vagaries of wires and wire-level protocols, allowing us to focus on sending data to a remote system. So what do we do with it? Layer other abstractions such as SMTP and HTTP on top, so we can forget about sending data and think about sending mail or getting text documents. Likewise, with Chef, we can build our own abstraction layers on top of the ones Chef provides, and we can build programmatically build up abstractions using data from anywhere. A simple example of this is taking a list of package names and building a list of package objects from it. This way, we can minimize repetition (stay DRY!), express our infrastructures succinctly, integrate Chef with any data source we need, and maximize automation.

Wrapping Up

In this post, we've covered the Chef why, seeing that Chef's design focuses on flexibility, repeatability, and pragmatism. In later posts in the series, we'll learn to speak like a Chef, and how to use Chef to build infrastructures that are both agile and robust.

More in This Series

References