YAML Ain't Markup Language (YAML™) 1.0

Final Draft 2004-JAN-29

Oren Ben-Kiki

<oren@ben-kiki.org>

Clark Evans

<cce@clarkevans.com>

Brian Ingerson

<ingy@ttul.org>

This version: www.yaml.org/spec/history/2004-01-29/
Latest version: www.yaml.org/spec/

Copyright 2001-2004 Oren Ben-Kiki, Clark Evans, Brian Ingerson

This document may be freely copied provided it is not modified.

Status of this Document

This is an intermediate working draft and is being actively revised. Hopefully the next draft will be a release canidate.

We wish to thank implementers who have tirelessly tracked earlier versions of this specification, and our fabulous user community whose feedback has both validated and clarified our direction.

Abstract

YAML™ (rhymes with “camel”) is a human friendly, cross language, unicode based data serialization language designed around the common native structures of agile programming languages. It is broadly useful for programming needs ranging from configuration files to Internet messaging to object persistence to data auditing. Together with the Unicode standard for characters, this specification provides all the information necessary to understand YAML Version 1.0 and to construct programs that process YAML information.


Table of Contents

1. Introduction
1.1. Goals
1.2. Prior Art
1.3. Relation to XML
1.4. Terminology
2. Preview
2.1. Collections
2.2. Structures
2.3. Scalars
2.4. Tags
2.5. Full Length Example
3. Processing YAML Information
3.1. Processes
3.1.1. Represent
3.1.2. Serialize
3.1.3. Present
3.1.4. Parse
3.1.5. Compose
3.1.6. Construct
3.2. Information Models
3.2.1. Node Graph Representation
3.2.2. Event / Tree Serialization
3.2.3. Character Stream Presentation
3.3. Completeness
3.3.1. Well-Formed
3.3.2. Resolved
3.3.3. Recognized and Valid
3.3.4. Available
4. Syntax
4.1. Characters
4.1.1. Character Set
4.1.2. Encoding
4.1.3. Indicators
4.1.4. Line Breaks
4.1.5. Miscellaneous
4.2. Space Processing
4.2.1. Indentation
4.2.2. Throwaway comments
4.3. YAML Stream
4.3.1. Document
4.3.2. Directive
4.3.3. Presentation Node
4.3.4. Node Property
4.3.5. Tag
4.3.6. Anchor
4.4. Alias
4.5. Collection
4.5.1. Sequence
4.5.2. Mapping
4.6. Scalar
4.6.1. End Of line Normalization
4.6.2. Block Modifiers
4.6.3. Explicit Indentation
4.6.4. Chomping
4.6.5. Literal
4.6.6. Folding
4.6.7. Folded
4.6.8. Single Quoted
4.6.9. Escaping
4.6.10. Double Quoted
4.6.11. Plain
A. Tag Repository
A.1. Sequence
A.2. Mapping
A.3. String
B. YAML Terms

Chapter1.Introduction

"YAML Ain't Markup Language" (abbreviated YAML) is a data serialization language designed to be human friendly and work well with modern programming languages for common everyday tasks. This specification is both an introduction to the YAML language and the concepts supporting it; and also a complete reference of the information needed to develop applications for processing YAML.

Open, interoperable and readily understandable tools have advanced computing immensely. YAML was designed from the start to be useful and friendly to the people working with data. It uses printable unicode characters, some of which provide structural information and the rest representing the data itself. YAML achieves a unique cleanness by minimizing the amount of structural characters, and allowing the data to show itself in a natural and meaningful way. For example, indentation is used for structure, colons separate pairs, and dashes are used for bulleted lists.

There are myriad flavors of data structures, but they can all be adequately represented with three basic primitives: mappings (hashes/dictionaries), sequences (arrays/lists) and scalars (strings/numbers). YAML leverages these primitives and adds a simple typing system and aliasing mechanism to form a complete language for encoding any data structure. While most programming languages can use YAML for data serialization, YAML excels in those languages that are fundamentally built around the three basic primitives. These include the new wave of agile languages such as Perl, Python, PHP, Ruby and Javascript.

There are hundreds of different languages for programming, but only a handful of languages for storing and transferring data. Even though its potential is virtually boundless, YAML was specifically created to work well for common use cases such as: configuration files, log files, interprocess messaging, cross-langauge data sharing, object persistence and debugging of complex data structures. When data is well organized and easy to understand, programming becomes a simpler task.

1.1.Goals

The design goals for YAML are:

  1. YAML documents are easily readable by humans.
  2. YAML uses the native data structures of agile languages.
  3. YAML data is portable between programming languages.
  4. YAML has a consistent model to support generic tools.
  5. YAML enables stream-based processing.
  6. YAML is expressive and extensible.
  7. YAML is easy to implement and use.

1.2.Prior Art

YAML's initial direction was set by the data serialization and markup language discussions among SML-DEV members. Later on it directly incorporated experience from Brian Ingerson's Perl module Data::Denter. Since then YAML has matured through ideas and support from its user community.

YAML integrates and builds upon concepts described by C, Java, Perl, Python, Ruby, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), XML, SAX and SOAP.

The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with that standard. Further, YAML borrows the idea of having multiple documents from MIME (RFC2045). YAML's top-level production is a stream of independent documents; ideal for message-based distributed processing systems.

YAML's indentation based block scoping is similar to Python's (without the ambiguities caused by tabs). Indented blocks facilitate easy inspection of a document's structure. YAML's literal scalar leverages this by enabling formatted text to be cleanly mixed within an indented structure without troublesome escaping.

YAML's double quoted scalar uses familar C-style escape sequences. This enables ASCII representation of non-printable or 8-bit (ISO 8859-1) characters such as “\x3B”. 16-bit Unicode and 32-bit (ISO/IEC 10646) characters are supported with escape sequences such as “\u003B” and “\U0000003B”.

Motivated by HTML's end-of-line normalization, YAML's folded scalar employs an intuitive method of handling white space. In YAML, single line breaks may be folded into a single space, while empty lines represent line break characters. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.

YAML's core type system is based on the requirements of Perl, Python and Ruby. YAML directly supports both collection (hash, array) values and scalar (string) values. Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).

Like XML's SOAP, YAML supports serializing native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to encode rich data structures required for modern distributed computing. YAML provides unique global type names using a namespace mechanism inspired by Java's DNS based package naming convention and XML's URI based namespaces.

YAML was designed to have an incremental interface that includes both a pull-style input stream and a push-style (SAX-like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.

1.3.Relation to XML

Newcomers to YAML often search for its correlation to the eXtensible Markup Language (XML). While the two languages may actually compete in several application domains, there is no direct correlation between them.

YAML is primarily a data serialization language. XML was designed to be backwards compatible with the Standard Generalized Markup Language (SGML) and thus had many design constraints placed on it that YAML does not share. Inheriting SGML's legacy, XML is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML is the result of lessons learned from XML and other technologies.

It should be mentioned that there are ongoing efforts to define standard XML/YAML mappings. This generally requires that a subset of each language be used. For more information on using both XML and YAML, please visit yaml.org/xml/.

1.4.Terminology

This specification uses key words in accordance with RFC2119 to indicate requirement level. In particular, the following words are used to describe the actions of a YAML processor:

may
This word, or the adjective “optional”, mean that conformant YAML processors are permitted, but need not behave as described.
should
This word, or the adjective “recommended”, mean that there could be reasons for a YAML processor to deviate from the behavior described, but that such deviation could hurt interoperability and should therefore be advertised with appropriate notice.
must
This word, or the term “required” or “shall”, mean that the behavior described is an absolute requirement of the specification.

Chapter2.Preview

This section provides a quick glimpse into the expressive power of YAML. It is not expected that the first-time reader grok all of the examples. Rather, these selections are used as motivation for the remainder of the specification.

2.1.Collections

YAML's block collections use indentation for scope and begin each member on its own line. Block sequences indicate each member with a dash(“-”). Block mappings use a colon to mark each (key:value) pair.

Example2.1. Sequence of scalars
(ball players)

- Mark McGwire
- Sammy Sosa
- Ken Griffey

Example2.2. Mapping of scalars to scalars
(player statistics)

hr:  65
avg: 0.278
rbi: 147

Example2.3. Mapping of scalars to sequences
(ball clubs in each league)

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

Example2.4. Sequence of mappings
(players' statistics)

-
  name: Mark McGwire
  hr:   65
  avg:  0.278
-
  name: Sammy Sosa
  hr:   63
  avg:  0.288

YAML also has in-line flow styles for compact notation. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curley braces. In YAML, the space after the “-” and “:” and “:” is mandatory.

Example2.5.Sequence of sequences

- [name        , hr, avg  ]
- [Mark McGwire, 65, 0.278]
- [Sammy Sosa  , 63, 0.288]


Example2.6.Mapping of mappings

Mark McGwire: {hr: 65, avg: 0.278}
Sammy Sosa: {
    hr: 63,
    avg: 0.288
  }

2.2.Structures

YAML uses three dashes(“---”) to separate documents within a stream. Comment lines begin with the pound sign(“#”). Three dots(“...”) indicate the end of a document without starting a new one, for use in communication channels.

Repeated nodes are first marked with the ampersand(“&”) and then referenced with an asterisk(“*”) thereafter.

Example2.7. Two documents in a stream
each with a leading comment

# Ranking of 1998 home runs
---
- Mark McGwire
- Sammy Sosa
- Ken Griffey

# Team ranking
---
- Chicago Cubs
- St Louis Cardinals

Example2.8. Play by play feed
from a game

---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
...
---
time: 20:03:47
player: Sammy Sosa
action: grand slam
...

Example2.9. Single document with two comments

---
hr: # 1998 hr ranking
  - Mark McGwire
  - Sammy Sosa
rbi:
  # 1998 rbi ranking
  - Sammy Sosa
  - Ken Griffey

Example2.10. Node for “Sammy Sosa
appears twice in this document

---
hr:
  - Mark McGwire
  # Following node labeled SS
  - &SS Sammy Sosa
rbi:
  - *SS # Subsequent occurance
  - Ken Griffey

The question mark indicates a complex key. Within a block sequence, mapping pairs can start immediately following the dash.

Example2.11.Mapping between sequences

? # PLAY SCHEDULE
  - Detroit Tigers
  - Chicago Cubs
:
  - 2001-07-23

? [ New York Yankees,
    Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
    2001-08-14 ]

Example2.12.Sequence key shortcut

---
# products purchased
- item    : Super Hoop
  quantity: 1
- item    : Basketball
  quantity: 4
- item    : Big Shoes
  quantity: 1


2.3.Scalars

Scalar values can be written in block form using a literal style(“|”) where all new lines count. Or they can be written with the folded style(“>”) for content that can be word wrapped. In the folded style, newlines are treated as a space unless they are part of a blank or indented line.

Example2.13. In literals,
newlines are preserved

# ASCII Art
--- |
  \//||\/||
  // ||  ||__

Example2.14. In the plain scalar,
newlines are treated as a space

---
  Mark McGwire's
  year was crippled
  by a knee injury.

Example2.15. Folded newlines preserved
for indented and blank lines

--- >
 Sammy Sosa completed another
 fine season with great stats.

   63 Home Runs
   0.288 Batting Average

 What a year!

Example2.16. Indentation determines scope

name: Mark McGwire
accomplishment: >
  Mark set a major league
  home run record in 1998.
stats: |
  65 Home Runs
  0.278 Batting Average

YAML's flow scalars include the plain style (most examples thus far) and quoted styles. The double quoted style provides escape sequences. Single quoted style is useful when escaping is not needed. All flow scalars can span multiple lines; intermediate whitespace is trimmed to a single space.

Example2.17.Quoted scalars

unicode: "Sosa did fine.\u263A"
control: "\b1998\t1999\t2000\n"
hexesc:  "\x13\x10 is \r\n"

single: '"Howdy!" he cried.'
quoted: ' # not a ''comment''.'
tie-fighter: '|\-*-/|'

Example2.18.Multiline flow scalars

plain:
  This unquoted scalar
  spans many lines.

quoted: "So does this
  quoted scalar.\n"

2.4.Tags

In YAML, plain (unquoted) scalars are given an implicit type depending on the application. The examples in this specification use types from YAML's tag repository, which includes types like integers, floating point values, timestamps, null, boolean, and string values.

Example2.19.Integers

canonical: 12345
decimal: +12,345
sexagecimal: 3:25:45
octal: 014
hexadecimal: 0xC

Example2.20.Floating point

canonical: 1.23015e+3
exponential: 12.3015e+02
sexagecimal: 20:30.15
fixed: 1,230.15
negative infinity: (-inf)
not a number: (NaN)

Example2.21.Miscellaneous

null: ~
true: y
false: n
string: '12345'

Example2.22.Timestamps

canonical: 2001-12-15T02:59:43.1Z
iso8601:  2001-12-14t21:59:43.10-05:00
spaced:  2001-12-14 21:59:43.10 -05:00
date:   2002-12-14

Explicit typing is denoted with a tag using the bang(“!”) symbol. Application tags should include a domain name and may use the caret(“^”) to abbreviate subsequent tags.

Example2.23.Various explicit tags

---
not-date: !str 2002-04-28

picture: !binary |
 R0lGODlhDAAMAIQAAP//9/X
 17unp5WZmZgAAAOfn515eXv
 Pz7Y6OjuDg4J+fn5OTk6enp
 56enmleECcgggoBADs=

application specific tag: !!something |
 The semantics of the tag
 above may be different for
 different documents.

Example2.24.Application specific tag

# Establish a tag prefix
--- !clarkevans.com,2002/graph/^shape
  # Use the prefix: shorthand for
  # !clarkevans.com,2002/graph/circle
- !^circle
  center: &ORIGIN {x: 73, y: 129}
  radius: 7
- !^line
  start: *ORIGIN
  finish: { x: 89, y: 102 }
- !^label
  start: *ORIGIN
  color: 0xFFEEBB
  value: Pretty vector drawing.

Example2.25.Unorderd set

# sets are represented as a
# mapping where each key is
# associated with the empty string
--- !set
? Mark McGwire
? Sammy Sosa
? Ken Griff

Example2.26.Ordered mappings

# ordered maps are represented as
# a sequence of mappings, with
# each mapping having one key
--- !omap
- Mark McGwire: 65
- Sammy Sosa: 63
- Ken Griffy: 58

2.5.Full Length Example

Below are two full-length examples of YAML. On the left is a sample invoice; on the right is a sample log file.

Example2.27.Invoice

--- !clarkevans.com,2002/^invoice
invoice: 34843
date   : 2001-01-23
bill-to: &id001
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
ship-to: *id001
product:
    - sku         : BL394D
      quantity    : 4
      description : Basketball
      price       : 450.00
    - sku         : BL4438H
      quantity    : 1
      description : Super Hoop
      price       : 2392.00
tax  : 251.42
total: 4443.52
comments:
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

Example2.28.Log file

---
Time: 2001-11-23 15:01:42 -05:00
User: ed
Warning:
  This is an error message
  for the log file
---
Time: 2001-11-23 15:02:31 -05:00
User: ed
Warning:
  A slightly different error
  message.
---
Date: 2001-11-23 15:03:17 -05:00
User: ed
Fatal:
  Unknown variable "bar"
Stack:
  - file: TopClass.py
    line: 23
    code: |
      x = MoreObject("345\n")
  - file: MoreClass.py
    line: 58
    code: |-
      foo = bar



Chapter3.Processing YAML Information

YAML is both a text format and a method for representing native language data structures in this format. This specification defines two concepts: a class of data objects called YAML representations, and a syntax for encoding YAML representations as a series of characters, called a YAML stream. A YAML processor is a tool for converting information between these complementary views. It is assumed that a YAML processor does its work on behalf of another module, called an application. This chapter describes the information structures a processor must provide to or obtain from the application.

YAML information is used in two ways: for machine processing, and for human consumption. The challange of reconciling these two perspectives is best done in three distinct translation stages: representation, serialization, and presentation. Representation addresses how YAML views native language data structures to achieve portability between programming environments. Serialization concerns

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.