This version:
www.yaml.org/spec/history/2004-01-29/
Latest version:
www.yaml.org/spec/
Copyright 2001-2004 Oren Ben-Kiki, Clark Evans, Brian Ingerson
Status of this Document
This is an intermediate working draft and is being actively revised. Hopefully the next draft will be a release canidate.
We wish to thank implementers who have tirelessly tracked earlier versions of this specification, and our fabulous user community whose feedback has both validated and clarified our direction.
Abstract
YAML™ (rhymes with “camel”) is a human friendly, cross language, unicode based data serialization language designed around the common native structures of agile programming languages. It is broadly useful for programming needs ranging from configuration files to Internet messaging to object persistence to data auditing. Together with the Unicode standard for characters, this specification provides all the information necessary to understand YAML Version 1.0 and to construct programs that process YAML information.
Table of Contents
"YAML Ain't Markup Language" (abbreviated YAML) is a data serialization language designed to be human friendly and work well with modern programming languages for common everyday tasks. This specification is both an introduction to the YAML language and the concepts supporting it; and also a complete reference of the information needed to develop applications for processing YAML.
Open, interoperable and readily understandable tools have advanced computing immensely. YAML was designed from the start to be useful and friendly to the people working with data. It uses printable unicode characters, some of which provide structural information and the rest representing the data itself. YAML achieves a unique cleanness by minimizing the amount of structural characters, and allowing the data to show itself in a natural and meaningful way. For example, indentation is used for structure, colons separate pairs, and dashes are used for bulleted lists.
There are myriad flavors of data structures, but they can all be adequately represented with three basic primitives: mappings (hashes/dictionaries), sequences (arrays/lists) and scalars (strings/numbers). YAML leverages these primitives and adds a simple typing system and aliasing mechanism to form a complete language for encoding any data structure. While most programming languages can use YAML for data serialization, YAML excels in those languages that are fundamentally built around the three basic primitives. These include the new wave of agile languages such as Perl, Python, PHP, Ruby and Javascript.
There are hundreds of different languages for programming, but only a handful of languages for storing and transferring data. Even though its potential is virtually boundless, YAML was specifically created to work well for common use cases such as: configuration files, log files, interprocess messaging, cross-langauge data sharing, object persistence and debugging of complex data structures. When data is well organized and easy to understand, programming becomes a simpler task.
The design goals for YAML are:
YAML's initial direction was set by the data serialization and markup language discussions among SML-DEV members. Later on it directly incorporated experience from Brian Ingerson's Perl module Data::Denter. Since then YAML has matured through ideas and support from its user community.
YAML integrates and builds upon concepts described by C, Java, Perl, Python, Ruby, RFC0822 (MAIL), RFC1866 (HTML), RFC2045 (MIME), RFC2396 (URI), XML, SAX and SOAP.
The syntax of YAML was motivated by Internet Mail (RFC0822) and remains partially compatible with that standard. Further, YAML borrows the idea of having multiple documents from MIME (RFC2045). YAML's top-level production is a stream of independent documents; ideal for message-based distributed processing systems.
YAML's indentation based block scoping is similar to Python's (without the ambiguities caused by tabs). Indented blocks facilitate easy inspection of a document's structure. YAML's literal scalar leverages this by enabling formatted text to be cleanly mixed within an indented structure without troublesome escaping.
YAML's double quoted scalar uses familar C-style escape sequences. This enables ASCII representation of non-printable or 8-bit (ISO 8859-1) characters such as “\x3B”. 16-bit Unicode and 32-bit (ISO/IEC 10646) characters are supported with escape sequences such as “\u003B” and “\U0000003B”.
Motivated by HTML's end-of-line normalization, YAML's folded scalar employs an intuitive method of handling white space. In YAML, single line breaks may be folded into a single space, while empty lines represent line break characters. This technique allows for paragraphs to be word-wrapped without affecting the canonical form of the content.
YAML's core type system is based on the requirements of Perl, Python and Ruby. YAML directly supports both collection (hash, array) values and scalar (string) values. Support for common types enables programmers to use their language's native data constructs for YAML manipulation, instead of requiring a special document object model (DOM).
Like XML's SOAP, YAML supports serializing native graph structures through a rich alias mechanism. Also like SOAP, YAML provides for application-defined types. This allows YAML to encode rich data structures required for modern distributed computing. YAML provides unique global type names using a namespace mechanism inspired by Java's DNS based package naming convention and XML's URI based namespaces.
YAML was designed to have an incremental interface that includes both a pull-style input stream and a push-style (SAX-like) output stream interfaces. Together this enables YAML to support the processing of large documents, such as a transaction log, or continuous streams, such as a feed from a production machine.
Newcomers to YAML often search for its correlation to the eXtensible Markup Language (XML). While the two languages may actually compete in several application domains, there is no direct correlation between them.
YAML is primarily a data serialization language. XML was designed to be backwards compatible with the Standard Generalized Markup Language (SGML) and thus had many design constraints placed on it that YAML does not share. Inheriting SGML's legacy, XML is designed to support structured documents, where YAML is more closely targeted at messaging and native data structures. Where XML is a pioneer in many domains, YAML is the result of lessons learned from XML and other technologies.
It should be mentioned that there are ongoing efforts to define standard XML/YAML mappings. This generally requires that a subset of each language be used. For more information on using both XML and YAML, please visit yaml.org/xml/.
This specification uses key words in accordance with RFC2119 to indicate requirement level. In particular, the following words are used to describe the actions of a YAML processor:
This section provides a quick glimpse into the expressive power of YAML. It is not expected that the first-time reader grok all of the examples. Rather, these selections are used as motivation for the remainder of the specification.
YAML's block collections use indentation for scope and begin each member on its own line. Block sequences indicate each member with a dash(“-”). Block mappings use a colon to mark each (key:value) pair.
Example2.1.
Sequence of scalars - Mark McGwire - Sammy Sosa - Ken Griffey |
Example2.2.
Mapping of scalars to scalars hr: 65 avg: 0.278 rbi: 147 |
Example2.3.
Mapping of scalars to sequences american: - Boston Red Sox - Detroit Tigers - New York Yankees national: - New York Mets - Chicago Cubs - Atlanta Braves |
Example2.4.
Sequence of mappings - name: Mark McGwire hr: 65 avg: 0.278 - name: Sammy Sosa hr: 63 avg: 0.288 |
YAML also has in-line flow styles for compact notation. The flow sequence is written as a comma separated list within square brackets. In a similar manner, the flow mapping uses curley braces. In YAML, the space after the “-” and “:” and “:” is mandatory.
Example2.5.Sequence of sequences - [name , hr, avg ] - [Mark McGwire, 65, 0.278] - [Sammy Sosa , 63, 0.288] |
Example2.6.Mapping of mappings Mark McGwire: {hr: 65, avg: 0.278} Sammy Sosa: { hr: 63, avg: 0.288 } |
YAML uses three dashes(“---”) to separate documents within a stream. Comment lines begin with the pound sign(“#”). Three dots(“...”) indicate the end of a document without starting a new one, for use in communication channels.
Repeated nodes are first marked with the ampersand(“&”) and then referenced with an asterisk(“*”) thereafter.
Example2.7.
Two documents in a stream # Ranking of 1998 home runs --- - Mark McGwire - Sammy Sosa - Ken Griffey # Team ranking --- - Chicago Cubs - St Louis Cardinals |
Example2.8.
Play by play feed --- time: 20:03:20 player: Sammy Sosa action: strike (miss) ... --- time: 20:03:47 player: Sammy Sosa action: grand slam ... |
Example2.9.
Single document with two comments --- hr: # 1998 hr ranking - Mark McGwire - Sammy Sosa rbi: # 1998 rbi ranking - Sammy Sosa - Ken Griffey |
Example2.10.
Node for “Sammy Sosa” --- hr: - Mark McGwire # Following node labeled SS - &SS Sammy Sosa rbi: - *SS # Subsequent occurance - Ken Griffey |
The question mark indicates a complex key. Within a block sequence, mapping pairs can start immediately following the dash.
Example2.11.Mapping between sequences ? # PLAY SCHEDULE - Detroit Tigers - Chicago Cubs : - 2001-07-23 ? [ New York Yankees, Atlanta Braves ] : [ 2001-07-02, 2001-08-12, 2001-08-14 ] |
Example2.12.Sequence key shortcut --- # products purchased - item : Super Hoop quantity: 1 - item : Basketball quantity: 4 - item : Big Shoes quantity: 1 |
Scalar values can be written in block form using a literal style(“|”) where all new lines count. Or they can be written with the folded style(“>”) for content that can be word wrapped. In the folded style, newlines are treated as a space unless they are part of a blank or indented line.
Example2.13.
In literals, # ASCII Art --- | \//||\/|| // || ||__ |
Example2.14.
In the plain scalar, --- Mark McGwire's year was crippled by a knee injury. |
Example2.15.
Folded newlines preserved --- > Sammy Sosa completed another fine season with great stats. 63 Home Runs 0.288 Batting Average What a year! |
Example2.16.
Indentation determines scope name: Mark McGwire accomplishment: > Mark set a major league home run record in 1998. stats: | 65 Home Runs 0.278 Batting Average |
YAML's flow scalars include the plain style (most examples thus far) and quoted styles. The double quoted style provides escape sequences. Single quoted style is useful when escaping is not needed. All flow scalars can span multiple lines; intermediate whitespace is trimmed to a single space.
Example2.17.Quoted scalars unicode: "Sosa did fine.\u263A" control: "\b1998\t1999\t2000\n" hexesc: "\x13\x10 is \r\n" single: '"Howdy!" he cried.' quoted: ' # not a ''comment''.' tie-fighter: '|\-*-/|' |
Example2.18.Multiline flow scalars plain: This unquoted scalar spans many lines. quoted: "So does this quoted scalar.\n" |
In YAML, plain (unquoted) scalars are given an implicit type depending on the application. The examples in this specification use types from YAML's tag repository, which includes types like integers, floating point values, timestamps, null, boolean, and string values.
Example2.19.Integers canonical: 12345 decimal: +12,345 sexagecimal: 3:25:45 octal: 014 hexadecimal: 0xC |
Example2.20.Floating point canonical: 1.23015e+3 exponential: 12.3015e+02 sexagecimal: 20:30.15 fixed: 1,230.15 negative infinity: (-inf) not a number: (NaN) |
Example2.21.Miscellaneous null: ~ true: y false: n string: '12345' |
Example2.22.Timestamps canonical: 2001-12-15T02:59:43.1Z iso8601: 2001-12-14t21:59:43.10-05:00 spaced: 2001-12-14 21:59:43.10 -05:00 date: 2002-12-14 |
Explicit typing is denoted with a tag using the bang(“!”) symbol. Application tags should include a domain name and may use the caret(“^”) to abbreviate subsequent tags.
Example2.23.Various explicit tags --- not-date: !str 2002-04-28 picture: !binary | R0lGODlhDAAMAIQAAP//9/X 17unp5WZmZgAAAOfn515eXv Pz7Y6OjuDg4J+fn5OTk6enp 56enmleECcgggoBADs= application specific tag: !!something | The semantics of the tag above may be different for different documents. |
Example2.24.Application specific tag # Establish a tag prefix --- !clarkevans.com,2002/graph/^shape # Use the prefix: shorthand for # !clarkevans.com,2002/graph/circle - !^circle center: &ORIGIN {x: 73, y: 129} radius: 7 - !^line start: *ORIGIN finish: { x: 89, y: 102 } - !^label start: *ORIGIN color: 0xFFEEBB value: Pretty vector drawing. |
Example2.25.Unorderd set # sets are represented as a # mapping where each key is # associated with the empty string --- !set ? Mark McGwire ? Sammy Sosa ? Ken Griff |
Example2.26.Ordered mappings # ordered maps are represented as # a sequence of mappings, with # each mapping having one key --- !omap - Mark McGwire: 65 - Sammy Sosa: 63 - Ken Griffy: 58 |
Below are two full-length examples of YAML. On the left is a sample invoice; on the right is a sample log file.
Example2.27.Invoice --- !clarkevans.com,2002/^invoice invoice: 34843 date : 2001-01-23 bill-to: &id001 given : Chris family : Dumars address: lines: | 458 Walkman Dr. Suite #292 city : Royal Oak state : MI postal : 48046 ship-to: *id001 product: - sku : BL394D quantity : 4 description : Basketball price : 450.00 - sku : BL4438H quantity : 1 description : Super Hoop price : 2392.00 tax : 251.42 total: 4443.52 comments: Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338. |
Example2.28.Log file --- Time: 2001-11-23 15:01:42 -05:00 User: ed Warning: This is an error message for the log file --- Time: 2001-11-23 15:02:31 -05:00 User: ed Warning: A slightly different error message. --- Date: 2001-11-23 15:03:17 -05:00 User: ed Fatal: Unknown variable "bar" Stack: - file: TopClass.py line: 23 code: | x = MoreObject("345\n") - file: MoreClass.py line: 58 code: |- foo = bar |
YAML is both a text format and a method for representing native language data structures in this format. This specification defines two concepts: a class of data objects called YAML representations, and a syntax for encoding YAML representations as a series of characters, called a YAML stream. A YAML processor is a tool for converting information between these complementary views. It is assumed that a YAML processor does its work on behalf of another module, called an application. This chapter describes the information structures a processor must provide to or obtain from the application.
YAML information is used in two ways: for machine processing, and for human consumption. The challange of reconciling these two perspectives is best done in three distinct translation stages: representation, serialization, and presentation. Representation addresses how YAML views native language data structures to achieve portability between programming environments. Serialization concerns