Strings in YAML - To Quote or not to Quote

+++ This article has been refurbished and moved to +++

www.yaml.info/learn/quote.html

(June 2020)


Old version:

This article covers scalar styles in YAML 1.1 and 1.2. It mostly works the same in both versions.

YAML is a data serialization language, and one design goal was that it's human friendly. It should be easy to read and edit, even if that makes parsing it harder.

Let's look at strings, specifically.

If you look at JSON, you have only one style to encode strings, and that's the double quoted style which doesn't allow literal linebreaks.

YAML files are used for many different purposes, and there are many types of strings, especially multiline strings. For each use case, you can choose the type of quoting (or no quoting) that makes the string readable and easy to edit.

This gives you lots of freedom, but you also have to learn using it to avoid mistakes.

The good news is, the YAML double quoted string works the same as in JSON, so if you know that already, you will be able to write correct YAML. (I should note, though, that it also depends on the processor you use, since not all are fully JSON compatible. The incompatible cases should be rare, though.)

You basically have five ways to express a string:

Table of Contents:

  • Quick comparison
  • Flow Scalars
    • Plain Scalar
    • When not to use Plain Scalars
    • Single quoted Scalar
    • Double quoted Scalar
  • Block Scalars
    • Block Scalar Types
    • Literal Block Scalar
    • Folded Block Scalar
    • Comments
    • Empty Lines at the beginning
    • Special Block Scalar Indicators
    • Block Scalar Chomping
    • Block Scalar Indenting
  • Summary

Quick Comparison

Ok, you're in a hurry and just want to get an overview:

plain scalars:
- a string
- a string with a \ backslash that doesn't need to be escaped
- can also use " quotes ' and $ a % lot /&?+ of other {} [] stuff

single quoted:
- '& starts with a special character, needs quotes'
- 'this \ backslash also does not need to be escaped'
- 'just like the " double quote'
- 'to express one single quote, use '' two of them'

double quoted:
- "here we can use predefined escape sequences like \t \n \b"
- "or generic escape sequences \x0b \u0041 \U00000041"
- "the double quote \" needs to be escaped"
- "just like the \\ backslash"
- "the single quote ' and other characters must not be escaped"

literal block scalar: |
  a multiline text
  line 2
  line 3

folded block scalar: >
  a long line split into
  several short
  lines for readability

Flow Scalars

Plain Scalar

In YAML, you can write a string without quotes, if it doesn't have a special meaning. See the next section for cases where you have to quote a string.

a string: no quotes needed
another string: with single ' and double " quotes
a url: http://example.org/

You can use literal tabs, backslashes and unicode characters:

a string: with a real<TAB> tab character and a \ backslash

But note that literal tabs are discouraged, as there are edge cases, and they are usually not easy to see.

You can not use escapes sequences like \n or \t here, it will be returned literally as "Backslash + n" / "Backslash + t".

The plain scalar can also span multiple lines, and the newlines will be folded into spaces:

multi:
  a
  b
  c
  d
  e
single: a b c d e

I think it's called "flow" scalar because the first line can actually start at the same line as its parent node and flow around it:

multi: a
  b
  c

However, the three lines are not aligned the same, which might make it less readable. I guess it's a matter of taste.

If lines are indented more than others, this is ignored:

multi:
  a
  b
  c
    d
  e

Any trailing spaces or spaces at the beginning of the line will be removed.

This can be very useful if you have a long string and want to limit the length of the lines in your YAML file.

There's also a way to enforce newlines. If you add a blank line, it will not be folded:

multi:
  a
  b

  c
  d
single: "a b\nc d" # a double quoted string, see below

Every following empty line after the first will be kept as a newline, too:

multi:
  a
  b


  c
  d
single: "a b\n\nc d"

A comment will end such a plain scalar, so the following example is invalid:

multi:
  first    # a comment
  second   # this is invalid

You can only use a comment at the end:

multi:
  first
  second   # a comment

It should be noted that, while a plain scalar cannot start with a -<space>, for example, the following lines can, although this might look like a badly indented sequence:

- a multiline
  - plain string

# same as
- "a multiline - plain string"

So you should avoid this.

When not to use Plain Scalars

Because a plain scalar without quotes can conflict with YAML syntax elements, there are some exceptions where you can not use it.

Character sequences that can't be used inside a plain scalar:

  • :<space> Block mapping entry. A colon is allowed, but only if it's not followed by whitespace
  • <space># This starts a comment

Characters that cannot be used at the start of an unquoted string:

  • ! Tag like !!null
  • & Anchor like &mapping_for_later_use
  • * Alias like *mapping_for_later_use
  • -<space> Block sequence entry
  • :<space> Block mapping entry
  • ?<space> Explicit mapping key
  • {, }, [, ] Flow mapping or sequence
  • , Flow Collection entry seperator
  • # Comment
  • |, > Block Scalar
  • @, '`' (backtick) Reserved characters
  • ", ' Double and single quote

There are some additional exceptions for scalars in Flow Style Collections:

flow style sequence: [ string one, string two ]
flow style mapping: { key: value }

As you can see, a comma or a square bracket will end a plain scalar. Therefor, to avoid confusion, the following characters are not allowed in plain scalars:

  • [, ]
  • {, }
  • ,

The following example with a colon without a space is also valid in flow style collections, but some processors don't allow it (currently):

request: { url: http://example.org/ }
urls: [http://example.org/, http://yaml.org/]

Additionally a colon is an indicator for a mapping key if it is followed by one of these characters []{},:

flow mapping: {key:[sequence]}

Some processors don't implement this correctly. To be sure you should always add a space.

Finally, to be compatible with JSON, you also can omit the space if the key is quoted:

flow mapping: { "quoted":23 }

Another use case for quotes is when you have a string that would be resolved as a special type. This highly depends on the YAML version and on the Schema in use. Here are some examples where you need quotes:

  • true, false
  • 23
  • 1e3
  • 3.14159
  • null

To learn which scalars are special in which version, you might read my article Introduction to YAML Schemas and Tags.

Single quoted Scalar

In the last section you learned when you have to quote a scalar. Single quoted scalars mostly work like plain scalars, only that the special character sequences are allowed:

a string:         '&enclosed in single quotes'
colon plus space: 'this colon : would be forbidden without quotes'
another colon plus space:
  'this colon : would create a mapping without quotes'
no comment:       'this would be # a comment without quotes'
curly brace:      '{ this would be: a flow style mapping }'
square bracket:   '[ this would be a flow style sequence ]'
backslash:        'this \n is a backslash and "n", not a linebreak'

Any character except ' will be returned literally. You can not use escapes sequences here.

The single quote itself is escaped by doubling it:

a string: 'with one single '' quote'

The following demonstrates that a backslash is not an escape character:

a string: 'this is \' # the end of the string'

In JSON, this would be:

{ "a string": "this is \\" }

So the # the end of the string' is really a comment.

Single quoted scalars can also be on multiple lines. They have similar folding rules as plain scalars. The following lines also have to be indented.

multi:
  'a
  b
  c
  d'
single: 'a b c d'


multi:
  'a
  b


  c
  d'
single: "a b\n\nc d"

Like with folding for plain scalars, trailing spaces or spaces at the beginning will be removed.

Double quoted Scalar

A double quoted scalar has the same rules as a single quoted scalar, plus some extra rules and escape sequences. This is the only scalar style where you can use escape sequences.

a string: "here's a \t tab and a \n newline, followed by a \\ backslash"
another string: "with an escaped \" double quote"

It's important to note, that only a limited set of characters can be escaped. Other escapes are invalid:

- "invalid \. escape"
- "invalid \' escape"
- "invalid \- escape"

There are special escape sequences which let you express any character:

- "a \x20 space"
- "a vertical \v tab can also be written as \x0B or \x0b"
- "an 'A' in 8-bit unicode: \x41"
- "an 'A' in 16-bit unicode: \u0041"
- "an 'A' in 32-bit unicode: \U00000041"

The list of allowed escapes can be found here:

In YAML 1.1, escaping of a slash is forbidden. In 1.2, this was one of the changes made to be compatible with JSON:

string: "escaped \/ slash"

The backslash also has an additional meaning. If you add it to the end of a line, the next line will be folded without a space.

This is useful when you want to break a long string into several lines, but it doesn't have spaces anywhere:

a long string without spaces:
  "word1\
  -word2\
  -word3"
single: "word1-word2-word3"

You can also use it to preserve spaces at the end:

multi:
  "the first line ends with 5 spaces     \
  second line"
single: "the first line ends with 5 spaces     second line"

In that case the five spaces are preserved and will be used for folding.

You can use a Backslash plus Space at the beginning of the line to get a similar effect:

multi:
  "first
  \     5 spaces
  third"
single: "first      5 spaces third"

Note that you will actually get six spaces in this case!

Block Scalars

When your string is longer, it can be a good idea to use a block scalar to make it more readable.

Literal Block Scalar

A Literal Block Scalar is introduced with the | pipe. The content starts on the next line and has to be indented:

literal: |
  line 1
  line 2
  # not a comment
  end

The indendation is detected from the first (non-empty) line of the block scalar.

The newlines will be preserved, so this is equivalent to:

quoted: "line 1\nline 2\n# not a comment\nend\n"

This way you can add all kinds of text to your YAML, for example a shell script:

bash: |
  #!/usr/bin/env bash
  echo "Help, I'm trapped in a YAML document!"
  exit 1

You could even embed a YAML document in YAML! If you ever had to do this in JSON, you know how ugly this can get.

Also trailing spaces will be preserved.

You can not use escape sequences like \t here.

Folded Block Scalar

The Folded Block Scalar, as the name suggests, will fold its lines with spaces. It is introduced with the > sign, which can be seen as a folded |.

a long command: >
  apt-get update
  && apt-get install -y
  git tig vim jq tmux tmate
quoted: "apt-get update && apt-get install -y git tig vim jq tmux tmate\n"

The folding rules are actually almost the same as for quoted scalars. You can enforce a newline with an empty line:

a text with long lines: >
  this is the first
  long line

  and this is the
  second

quoted: "this is the first long line\nand this is the second\n"

There's an additional way to enforce newlines, and probably not very well known:

a long text with enforced newlines: >
  line
  one
    line two
    line three
  line
  four
quoted: "line one\n  line two\n  line three\nline four\n"

Another difference to quoted folding is that trailing spaces are kept:

a text with long lines: >
  trailing spaces___
  continued

quoted: "trailing space    continued"

Like in Literal Block Scalars, you cannot use escape sequences here.

Comments

If a line starting with # is indented correctly, it will not be interpreted as a comment:

literal: |
  a
  # no comment
  b
quoted: "a\n# no comment\nb\n"

Also note that even the first line can start with a #:

folded: >
  # no comment
  a
  b
quoted: "# no comment a b\n"

A less indented line starting with a # will be interpreted as a comment and will also end the block scalar:

folded: >
    a
    b
  # a comment, end of block scalar

You can add comments to a block scalar directly after the header:

literal: | # a block scalar
  abc
  def
folded: > # a block scalar
  abc
  def

Empty Lines at the beginning

Unlike trailing empty lines, at the beginning they will be preserved. Note that lines containing only spaces count as empty lines here. An underscore "_" is used to represent the spaces:

folded: >
__
____
    a
    b

quoted: "\n\na b\n"

Special Block Scalar Indicators

You might have noticed that Block Scalars always end with a newline. This is the default behaviour. Any further trailing newlines will be stripped:

literal: |
  a
  b


quoted: "a\nb\n"



folded: >
  a
  b


quoted: "a b\n"

Block Scalar Chomping

If you don't want to end your scalar with a newline, you can use the - chomping indicator:

literal: |-
  a
  b

quoted: "a\nb"


folded: >-
  a
  b

quoted: "a b"

If you want to keep all trailing newlines, use the + indicator:

literal: |+
  a
  b


quoted: "a\nb\n\n\n"

folded: >+
  a
  b


quoted: "a b\n\n\n"

Block Scalar Indenting

Sometimes, your block scalar might start with one or multiple spaces that you want to preserve:

literal: | # invalid!
    This Is A Header
  The body starts here

All continuation lines in a block scalar have to be indented at least as much as the first line.

So how can you preserve the spaces in the first line? By specifying the number of indentation spaces in the block scalar header:

implicit: | # indentation is 1
 line

explicit: |2
    This Is A Header
  The body starts here

This tells the YAML processor that the indentation is 2. Note that the number must be greater than zero.

You can also combine the indicators, and the order does not matter:

literal: |-2
    header
  body

quoted: "  header\nbody"

Document Header and Footer

A special note about the Document Headers and Footers.

In YAML, ---<space> or ---<linebreak> at the beginning of a line explicitly starts the document.

...<space> or ...<linebreak> ends a document.

Even inside of Block Scalars or Quoted Scalars they still have their special meaning.

If your YAML document consistes of only one string, it can have an indentation of zero, and the following examples are invalid:

---
"invalid
---
scalar"

---
"invalid
...
scalar"

On the other hand, the following example actually consists of two YAML documents:

--- >
block
scalar
---
plain
scalar

Summary

  • Only in double quotes you can use escape sequences like \n
  • Plain scalars can be used only if they don't start with a special character (sequence) or contain a colon followed by a space (or end with a colon)
  • Single quoted scalars allow to include special characters. Only ' needs to be escaped by doubling it.
  • Double quoted scalars allow escape sequences like \t, \x0a. The double quote itself is escaped with \"
  • Literal block scalars preserve newlines and trailing spaces
  • Folded block scalars will fold its lines with spaces
  • | or > will strip any trailing empty lines, but keep the last newline
  • |+ or >+ will keep trailing empty lines
  • |- or >- will strip all trailing empty lines and the last newline
  • You can specify the indendation of block scalars with |<num> or ><num>

Have fun YAMLing!

Changes

  • Add example about trailing whitespaces in > Folded Block Scalars 2018-04-02

1 Comment

This is a great article. I will keep it for future reference.
I'm struggling with a YAML issue in an AWS CloudFormation template. My desired result is literally the following expression (single line):

sed -i.bak '/PRE_CLASSPATH=/c\PRE_CLASSPATH="${MW_HOME}/bi/bifoundation/jdbc/jdk18/bijdbc.jar${CLASSPATHSEP}${WL_HOME}/modules/net.shibboleth.utilities.java-support.jar${CLASSPATHSEP}${WL_HOME}/modules/org.slf4j.slf4j-api.jar${CLASSPATHSEP}${PRE_CLASSPATH}"' /opt/oracle/config/domains/bi/bin/setDomainEnv.sh

I'm having issues with the backslash '\' and with the double quotes '"'. I just can't get it right...
Any suggestion?

Leave a comment

About tinita

user-pic just another perl punk,