TOC 
M. Rose
 Dover Beach Consulting, Inc.
 February 28, 2008


Writing I-Ds and RFCs using XML (revised)

Abstract

This memo presents a technique for using XML (Extensible Markup Language) as a source format for documents in the Internet-Drafts (I-Ds) and Request for Comments (RFC) series. This memo is an upwards-compatible revision to RFC 2629.



Table of Contents

1.  Introduction
2.  Using the DTD to Write I-Ds and RFCs
    2.1.  XML basics
    2.2.  Front matter
        2.2.1.  The title Element
        2.2.2.  The author Element
        2.2.3.  The date Element
        2.2.4.  Meta Data Elements
        2.2.5.  The abstract Element
        2.2.6.  The note Element
        2.2.7.  Status, Copyright Notice, Table of Contents
        2.2.8.  Everything in the Front
    2.3.  The Middle
        2.3.1.  The section Element
        2.3.2.  The appendix Element
    2.4.  Back matter
        2.4.1.  The references Element
        2.4.2.  Appendices
        2.4.3.  Copyright Status
3.  Processing the XML Source File
    3.1.  Editing
        3.1.1.  Checking
    3.2.  Converting to Text Format
    3.3.  Converting to HTML Format
    3.4.  Searching
Appendix A.  The rfc Element
Appendix B.  The DTD
Appendix C.  Changes from RFC 2629
Appendix D.  Conformance with RFC 2026 or RFC 3667 (Historic)
Appendix E.  Acknowledgements
4.  Security Considerations
5.  References
§  Index
§  Author's Address




 TOC 

1.  Introduction

This memo describes how to write a document for the I-D and RFC series using the Extensible Markup Language (Paoli, J., Maler, E., Bray, T., and C. Sperberg-McQueen, “Extensible Markup Language (XML) 1.0 (Second Edition),” October 2000.) [1] (XML). This memo has three goals:

  1. To describe a simple XML Document Type Definition (DTD) that is powerful enough to handle the simple formatting requirements of RFC-like documents whilst allowing for meaningful markup of descriptive qualities.
  2. To describe software that processes XML source files, including a tool that produces documents conforming to RFC 2223 (Postel, J. and J. Reynolds, “Instructions to RFC Authors,” October 1997.) [2], HTML format, and so on.
  3. To provide the proof-of-concept for the first two goals (this memo was written using this DTD and produced using that software).

It is beyond the scope of this memo to discuss the political ramifications of using XML as a source format for RFC-like documents. Rather, it is simply noted that adding minimal markup to plain text:



 TOC 

2.  Using the DTD to Write I-Ds and RFCs

We do not provide a formal or comprehensive description of XML. Rather, this section discusses just enough XML to use a Document Type Declaration (DTD) to write RFC-like documents.

If you're already familiar with XML, skip to Appendix B (The DTD) to look at the DTD.



 TOC 

2.1.  XML basics

There are very few rules when writing in XML, as the syntax is (deceptively) simple. There are five terms you'll need to know:

  1. An "element" usually refers to a start tag, an end tag, and all the characters in between, e.g., <example>text and/or nested elements</example>.
  2. An "empty element" combines the start tag and the end tag, e.g., <empty/>. For readability, I prefer to write this as <empty /> — both are legal XML. You don't find empty elements in HTML.
  3. An "attribute" is part of an element. If present, they occur in the start tag, e.g., <example name='value'>. Of course, they can also appear in empty elements, e.g., <empty name='value' />.
  4. An "entity" is a textual macro that starts with &. Usually, you'll only use them whenever you want to put a & or a < in your text.
  5. A "token" is a string of characters. The first character is either a letter or an underscore (_). Any characters that follow are either letters, numbers, an underscore, or a period (.).

First, start your source file with an XML declaration, a reference to the DTD, and the rfc element:

    <?xml version='1.0' ?>
    <!DOCTYPE rfc SYSTEM 'rfcXXXX.dtd'>
    <rfc>
        ...
    </rfc>

Ignore the first two lines — the declaration and the reference — and simply treat them as opaque strings. Nothing else should be present after the </rfc> tag.

Second, make sure that all elements are properly matched and nested. A properly matched element that starts with <example> is eventually followed with </example>. (Empty elements are always matched.) Elements are properly nested when they don't overlap.

For example,

    <outer>
        ...
        <inner>
            ...
        </inner>
        ...
    </outer>

is properly nested.

However,

    <outer>
        ...
        <inner>
            ...
        </outer>
        ...
    </inner>

overlaps, so the elements aren't properly nested.

Third, never use < or & in your text. Instead, use either &lt; or &amp;, respectively.

Fourth, there are two quoting characters in XML, apostrophe (') and quotation ("). Make sure that all attributes values are quoted, e.g., <example name='value'>. If the value contains one of the quoting characters, then use the other to quote the value, e.g., <example name='"'>, If the value contains both quoting characters, then use one of them to quote the value, and replace occurrences of that character in the attribute value with either &apos; (apostrophe) or &quot; (quotation), e.g., <example name='"&apos;"'>.

If you want to put a comment in your source file, here's the syntax:

        <!-- comments can be multiline,
         if you wish -->

Finally, XML is case sensitive, which means that <foo> is different from <Foo>.



 TOC 

2.2.  Front matter

Immediately following the <rfc> tag is the front element:

    <?xml version='1.0'?>
    <!DOCTYPE rfc SYSTEM 'rfcXXXX.dtd'>
    <rfc>
        <front>
            <title ...>
            <author ...>
            <author ...>
            <date ...>
            <area ...>
            <workgroup ...>
            <keyword ...>
            <keyword ...>
            <abstract ...>
            <note ...>
        </front>
        ...
    </rfc>

(Note that in all examples, indentation is used only for expository purposes.)

The front element consists of a title element, one or more author elements, a date element, one or more optional area elements, one or more optional workgroup elements, one or more optional keyword elements, an optional abstract element. and, one or more optional note elements.



 TOC 

2.2.1.  The title Element

The title element identifies the title of the document. Because the title will be used in the headers of the document when formatted according to [2] (Postel, J. and J. Reynolds, “Instructions to RFC Authors,” October 1997.), if the title is more than 42 characters, then an abbreviation should also be provided, e.g.,

    <title abbrev='Much Ado about Nothing'>
    The IETF's Discussion on "Source Format of RFC Documents"
    </title>


 TOC 

2.2.2.  The author Element

Each author element identifies a document author. Since a document may have more than one author, more than one author element may be present. If the author is a person, then three attributes must be present in the <author> tag, initials, surname, and fullname, e.g.,

    <author initials='F.J.' surname='Flintstone'
            fullname='Frederick Flintstone'>

There is also an optional role attribute, which, if present, must take the value "editor".

The author element itself consists of an organization element, and, an optional address element.

The organization element is similar to the title element, in that an abbreviation may be paired with a long organization name using the abbrev attribute, e.g.,

    <organization abbrev='ISI'>
        USC/Information Sciences Institute
    </organization>

The address element consists of an optional postal element, an optional phone element, an optional facsimile element, an optional email element, and, an optional uri element.

The postal element contains one or more street elements, followed by any combination of city, region (state or province), code (zipcode or postal code), and country elements, e.g.,

    <postal>
        <street>660 York Street</street>
        <street>M/S 40</street>
        <city>San Francisco</city> <region>CA</region>
        <code>94110</code>
        <country>US</country>
    </postal>

This flexibility is provided to allow for different national formats for postal addresses. Note however, that although the order of the city, region, code, and country elements isn't specified, at most one of each may be present. Regardless, these elements must not be re-ordered during processing by an XML application (e.g., display applications must preserve the ordering of the information contained in these elements). Finally, the value of the country element should be a two-letter code from ISO 3166.

The phone, facsimile, email, and uri elements are simple, e.g.,

    <phone>+1 916 555 1234</phone>
    <email>fred@example.com</email>
    <uri>example.com/</uri>


 TOC 

2.2.3.  The date Element

The date element identifies the publication date of the document. It consists of a month and a year, e.g.,

    <date month='February' year='1999' />

The date element also has an optional day attribute. (Actually, due to popular demand, all three attributes are optional.)



 TOC 

2.2.4.  Meta Data Elements

The front element may contain meta data — the content of these elements does not appear in printed versions of the document.

A document has one or more optional area, workgroup, and keyword elements, e.g.,

    <area>General</area>
    <workgroup>RFC Beautification Working Group</workgroup>
    <keyword>RFC</keyword>
    <keyword>Request for Comments</keyword>
    <keyword>I-D</keyword>
    <keyword>Internet-Draft</keyword>
    <keyword>XML</keyword>
    <keyword>Extensible Markup Language</keyword>

The area elements identify a general category for the document (e.g., one of "Applications", "General", "Internet", "Management", "Operations", "Routing", "Security", "Transport", or "User"), while the workgroup elements identify the IETF working groups that produced the document, and the keyword elements identify useful search terms.



 TOC 

2.2.5.  The abstract Element

A document may have an abstract element, which contains one or more t elements (The t Element). In general, only a single t element is present, e.g.,

    <abstract>
        <t>This memo presents a technique for using XML
        (Extensible Markup Language) as a source format
        for documents in the Internet-Drafts (I-Ds) and
        Request for Comments (RFC) series.</t>
    </abstract>


 TOC 

2.2.6.  The note Element

A document may have one or more note elements, each of which contains one or more t elements (The t Element). There is a mandatory title attribute. In general, the note element contains text from the IESG, e.g.,

    <note title='IESG Note'>
        <t>The IESG has something to say.</t>
    </note>


 TOC 

2.2.7.  Status, Copyright Notice, Table of Contents

Note that text relating to the memo's status, copyright notice, or table of contents is not included in the document's markup — this is automatically inserted by an XML application when it produces either a text or HTML version of the document.



 TOC 

2.2.7.1.  Conformance with RFC 3978

If an Internet-Draft is being produced, then the ipr attribute should be present in the <rfc> tag at the beginning of the file. The value of the attribute should be one of: full3978, noModification3978, or noDerivatives3978. For the latter two options, an additional attribute, iprExtract, will be consulted. If present, its value is an anchor that is used to cross-reference the section of the document that may be extracted as-is for separate use.

Consult [3] (Bradner, S., “IETF Rights in Contributions,” March 2005.) for further details.

If the Internet-Draft is being submitted to an automated process, then the docName attribute should be present in the <rfc> tag at the beginning of the file. The value of this attribute contains the document (not file) name associated with this Internet-Draft, e.g.,

    <rfc ipr='full3978' docName='draft-mrose-writing-rfcs-01'>
        ...
    </rfc>

Finally, an xml:lang attribute may be present to indicate that the document is written in some language other than English (for writing things other than RFCs).



 TOC 

2.2.8.  Everything in the Front

So, putting it all together, we have, e.g.,

    <front>
        <title>Writing I-Ds and RFCs using XML</title>

        <author initials='F.J.' surname='Flintstone'
                fullname='Frederick Flintstone'>
            <organization>Slate Construction, Inc.</organization>

            <address>
                <postal>
                    <street>660 York Street</street>
                    <street>M/S 40</street>
                    <city>San Francisco</city> <region>CA</region>
                    <code>94110</code>
                    <country>US</country>
                </postal>

                <phone>+1 916 555 1234</phone>
                <email>fred@example.com</email>
                <uri>example.com/</uri>
            </address>
        </author>

        <date month='February' year='1999' />

        <area>General</area>
        <workgroup>RFC Beautification Working Group</workgroup>
        <keyword>RFC</keyword>
        <keyword>Request for Comments</keyword>
        <keyword>I-D</keyword>
        <keyword>Internet-Draft</keyword>
        <keyword>XML</keyword>
        <keyword>Extensible Markup Language</keyword>
        <abstract>
            <t>This memo presents a technique for using XML
            (Extensible Markup Language) as a source format
            for documents in the Internet-Drafts (I-Ds) and
            Request for Comments (RFC) series.</t>
        </abstract>
    </front>


 TOC 

2.3.  The Middle

Note well:
Although this draft refers to the appendix element, the text referring to that element is entirely speculative (until such time as this advisory is removed).

The middle element contains all the sections of the document except for the bibliography and the boilerplate:

    ...
    </front>
    <middle>
        <section ...>
        <section ...>
        <section ...>
        <appendix ...>
        <appendix ...>
    </middle>
    <back>
    ...

The middle element consists of one or more section elements, optionally followed by one or more appendix elements, optionally followed by one or more section elements.



 TOC 

2.3.1.  The section Element

Each section element contains a section of the document. There is a mandatory attribute, title, that identifies the title of the section. There are also two optional attributes, anchor, that is used for cross-referencing with the xref element (The xref Element), e.g.,

    <section anchor='intro' title='Introduction'>
        ...
    </section>

and the toc attribute, which is used to indicate whether the section should appear in the table of contents. (The choices are "exclude", "include", and "default").

The section element is recursive — each contains any number and combination of t, figure, texttable, iref, and section elements, e.g.,

    <section title='The Middle'>
        ...
        <section title='The section Element'>
            ...
            <section title='The t Element'>...</section>
            <section title='The list Element'>...</section>
            <section title='The figure Element'>...</section>
            <section title='The texttable Element'>...</section>
            <section title='The xref Element'>...</section>
            <section title='The eref Element'>...</section>
            <section title='The iref Element'>...</section>
            <section title='The cref Element'>...</section>
            <section title='The spanx Element'>...</section>
            <section title='The vspace Element'>...</section>
        </section>
    </section>

Note that the section element is tail-recursive.



 TOC 

2.3.1.1.  The t Element

Paragraphs are contained in t elements. A paragraph can consist of text, lists, figures, and other t element-delimited paragraphs, in any number or combination.

If a cross-reference is needed to a section, figure, table, or reference, the xref element (The xref Element) is used; similarly, if an external-reference is needed, the eref element (The eref Element) is used. Indexing of text is provided by the the iref element (The iref Element).

Note well:
Although RFC2629 allows the figure element to be nested within the t element, authors are strongly encouraged to avoid this usage — it is always preferable to place the figure element as a direct subordinate of the section element.


 TOC 

2.3.1.2.  The list Element

The list element contains one or more items. Each item is a t element, allowing for recursion, e.g.,

    <list style='numbers'>
        <t>The first item.</t>
        <t>The second item, which contains two bulleted sub-items:
            <list style='symbols'>
                <t>The first sub-item.</t>
                <t>The second sub-item.</t>
            </list>
        </t>
    </list>

The list element has an optional attribute, style, having the value "numbers" (for numeric lists), "letters" (for alphabetic lists), "symbols" (for bulleted lists), "hanging" (for hanging lists), "format" (for auto-formatted lists), or, "empty" (for indented text). If a list element is nested, the default value is taken from its closest parent; otherwise, the default value is "empty".

When nested within a hanging list element, the t element has an optional attribute, hangText that specifies the text to be inserted, e.g.,

    <list style='hanging'>
        <t hangText="counter:">the "counting designation" is
        rendered
        (e.g., "2.1" or "A.2");</t>

        <t hangText="title:">the title attribute of the
        corresponding element is rendered
        (e.g., "XML Basics");</t>

        <t hangText="none:">no additional designation is rendered;
        or,</t>

        <t hangText="default:">a suitable designation is rendered,
        e.g., "Section 2.1" or
        "&lt;a class='#xml_basics'>XML Basics&lt;/a>"
        (the default).</t>
    </list>

The style attribute value for an auto-formatted list starts with the seven letters "format ", and is followed by a string which must contain exactly one instance of "%d" and "%c". Hanging text is automatically generated for each nested t element, e.g.,

    <list style='format R%d:'>
        <t>Text for R1.</t>

        <t>Text for R2.</t>
    </list>
    ...
    <list style='format Directive %c:'>
        <t>Text for Directive A.</t>

        <t>Text for Directive B.</t>
    </list>
    ...
    <list style='format R%d:'>
        <t>Text for R3.</t>
    </list>

If the list is auto-formatted, then the optional counter attribute is consulted, which controls the numbering. By default, the value of this attribute is the same as the formatting string, e.g.,

    <list style='format R%d:' counter='Requirements'>
        <t>Text for R1.</t>

        <t>Text for R2.</t>
    </list>
    ...
    <list style='format Directive %c:' counter='Directives'>
        <t>Text for Directive A.</t>

        <t>Text for Directive B.</t>
    </list>
    ...
    <list style='format R%d:' counter='Requirements'>
        <t>Text for R3.</t>
    </list>

If the style attribute has the value "hanging" or "format", then a second, optional, attribute called hangIndent is consulted. This overrides the default indentation used for the text of each t element, ensuring that each t element has the same indentation, e.g.,

    <list style='format R%d:' hangIndent='5'>
        <t>Text for R1.</t>

        <t>Text for R2.</t>

        ...

        <t>Text for R12.</t>
    </list>

The final item will read "R12: Text for R12."



 TOC 

2.3.1.3.  The figure Element

The figure element groups an optional preamble element, an artwork element, and an optional postamble element together. The figure element also has an optional anchor attribute that is used for cross-referencing with the xref element (The xref Element). There is also an optional title attribute that identifies the title of the figure.

The preamble and postamble elements, if present, are simply text. If a cross-reference is needed to a section, figure, table, or reference, the xref element (The xref Element) is used; similarly, if an external-reference is needed, the eref element (The eref Element) is used. Indexing of text is provided by the the iref element (The iref Element).

The artwork element, which must be present, contains "ASCII artwork". Unlike text contained in the t, preamble, or postamble elements, both horizontal and vertical whitespace is significant in the artwork element.

So, putting it all together, we have, e.g.,

    <figure anchor='figure_example'>
        <preamble>So,
        putting it all together, we have, e.g.,</preamble>
        <artwork>
            ascii artwork goes here...

            be sure to use "&lt;" or "&amp;" instead of "<" and "&",
            respectively!
        </artwork>
        <postamble>which is a very simple example.</postamble>
    </figure>

which is a very simple example.

If you have artwork with a lot of "<" characters, then there's an XML trick you can use:

    <figure>
        <preamble>If you have artwork with a lot of "&lt;"
        characters, then there's an XML trick you can
        use:</preamble>
        <artwork><![CDATA[
            ascii artwork goes here...

            just don't use "]]" in your artwork!
        ]]></artwork>
        <postamble>The "&lt;![CDATA[ ... ]]>" construct is called
        a CDATA block -- everything between the innermost brackets
        is left alone by the XML application.</postamble>
    </figure>

The <![CDATA[ ... ]]> construct is called a CDATA block — everything between the innermost brackets is left alone by the XML application.

Because the figure element represents a logical grouping of text and artwork, an XML application producing a text version of the document should attempt to keep these elements on the same page. Because RFC 2223 (Postel, J. and J. Reynolds, “Instructions to RFC Authors,” October 1997.) [2] allows no more than 69 characters by 49 lines of content on each page, XML applications should be prepared to prematurely introduce page breaks to allow for better visual grouping.

Finally, the artwork element has two optional attributes: name and type. The former is used to suggest a filename to use when storing the content of the artwork element, whilst the latter contains a suggestive data-typing for the content.



 TOC 

2.3.1.4.  The texttable Element

The texttable element groups an optional preamble element, one or more ttcol elements, zero or more c elements, and an optional postamble element together. The texttable element also has an optional anchor attribute that is used for cross-referencing with the xref element (The xref Element). There is also an optional title attribute that identifies the title of the table.

The preamble and postamble elements have already been described in Section 2.3.1.3 (The figure Element).

The ttcol element, of which at least one must be present, defines a column header for the table, along with the desired width and alignment for the column:

The c element, is present for each cell in the table, and contains text along with the usual cross-reference and indexing elements.

So, putting it all together, we have, e.g.,

    <texttable anchor='table_example'>
        <preamble>So,
        putting it all together, we have, e.g.,</preamble>
        <ttcol align='center'>ttcol #1</ttcol>
        <ttcol align='center'>ttcol #2</ttcol>
        <c>c #1</c>
        <c>c #2</c>
        <c>c #3</c>
        <c>c #4</c>
        <c>c #5</c>
        <c>c #6</c>
        <postamble>which is a very simple example.</postamble>
    </texttable>

which is a very simple example.

So, putting it all together, we have, e.g.,

ttcol #1ttcol #2
c #1 c #2
c #3 c #4
c #5 c #6

which is a very simple example.

As with the figure element, the texttable element represents a logical grouping of text, hence an XML application producing a text version of the document should attempt to keep these elements on the same page.



 TOC 

2.3.1.5.  The xref Element

The xref element is used to cross-reference sections, figures, tables, and references. The mandatory target attribute is used to link back to the anchor attribute of the section, figure, and reference elements. The value of the anchor and target attributes should be formatted according to the token syntax in Section 2.1 (XML basics).

If used as an empty element, e.g.,

    according to the token syntax in <xref target='xml_basics' />.

then the XML application inserts an appropriate phrase during processing.

What's "appropriate" depends on the value of the optional format attribute. There are four possible values:

counter:
the "counting designation" is rendered (e.g., "2.1" or "A.2");
title:
the title attribute of the corresponding element is rendered (e.g., "XML Basics");
none:
no additional designation is rendered; or,
default:
a suitable designation is rendered, e.g., "Section 2.1" or "<a class='#xml_basics'>XML Basics</a>" (the default).

If used with content, e.g.,

    conforming to <xref target='RFC2223'>RFC 2223</xref>.

then the XML application inserts an appropriate designation during processing, such as "RFC 2223[2]" or "<a class='#refs.RFC2223'>RFC 2223</a>". Although the XML application decides what "an appropriate designation" might be, its choice is consistent throughout the processing of the document.



 TOC 

2.3.1.6.  The eref Element

The eref element is used to reference external documents. The mandatory target attribute is a URI (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax,” August 1998.) [6], e.g.,

    <eref target='www.ibiblio.org/xml/'>Cafe con Leche</eref>

Note that while the target attribute is always present, the eref element may be empty, e.g.,

    <eref target='example.com/' />

and the XML application inserts an appropriate designation during processing such as "[9]" or "<a class='example.com/'>example.com/</a>".



 TOC 

2.3.1.7.  The iref Element

The iref element is used to add information to an index, typically rendered at the end of the document. The mandatory item attribute is the primary key the information is stored under, whilst the optional subitem attribute is the secondary key, e.g.,

    <iref item='indexing' subitem='how to' />

The optional primary attribute can be used to indicate that this particular indexing entry should be considered "primary".

Finally, note that the iref element is always empty — it never contains any

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.