My pet peeve with YAML

YAMLYAML is a popular data encoding language. Its data model is similar to JSON‘s data model where data is described in terms of strings, numbers, arrays and hash-maps. In fact, YAML is a super-set of JSON that includes indention-based syntax to make it easier for humans to read and write it.

YAML is very popular. Libraries for reading and writing it had been written for most, if not all, programming languages. It is used as the basis for the configuration DSL of many tools including Ansible, Puppet’s Hiera, and Jenkins Job Builder.

The use YAML as the basis for DSLs in which complex and extensive configuration is written by hand, eventually exposes one to the language’s shortcomings.

Consider the following two examples:

Example 1:

---
lorem:
  - ipsum:
    sit:   amet
    facer: "est at, primis"
  - salutatus ullamcorper
    eu        eum

Example 2:

---
lorem:
    - ipsum:
        sit:   amet
        facer: "est at, primis"
    - salutatus ullamcorper
        eu      eum

The two examples are almost identical. In fact, they have been typed with the exact same set of keystrokes. The only difference is that the text editor had been configured to a different text indention size. In example 1 the indention was of 2 spaces – common to Ruby programs, while in example 2 the indention was of 4 spaces which is common to Python programs.

Despite the very subtle textual difference, the data encoded by the two example is different in a way that may be hard to grasp. Both examples define a hash with a single key – ‘lorem’. That key points to an array in both examples. The contents of the array, however, are different.

In example 1 the first element of the array is a hash with 3 keys: ‘ipsum’, ‘sit’, and ‘facer’, which point to a null value, ‘amet’ and, ‘est at, primis’ receptively.

In example 2, the first array element is also a hash, but this time it only has a single key: ‘ipsum’. That key points to another hash that has the ‘sit’ and ‘facer’ keys.

The last two lines of both example encode the second member of the ‘lorem’ array. Superficially it looks very similar to the first member, but the lack colones creates a situation where the 2nd member is actually just one big string.

The examples are contrived. The issues are easy to spot in them. But when such issues occur within a large and complex YAML file, they can cause quite a headache. Especially since many tools use a generic YAML parser that doesn’t give much by ways of helpful error messages.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s