An Uber-comparison of RDFa, Microdata and Microformats

By ManuSporny On June 25, 2011 In RDFa, Semantic Web With 44 Comments spacer Permalink

Full disclosure: I am the current Chair of the group at the World Wide Web Consortium that created RDFa. That said, all of this is my personal opinion. I am not speaking on behalf of the W3C or my company, Digital Bazaar. This is just my personal take on the recent events that are unfolding. If you would like to keep up with these events as they happen, you can follow me on Twitter.

There has been a recent discussion at the World Wide Web Consortium (W3C) about the state of RDFa, Microdata and Microformats. The Technical Architecture Group (TAG) is concerned about the W3C publishing two specifications that achieve effectively the same thing in incompatible ways. They are suggesting that both RDFa 1.1 and Microdata, in their current state, should not proceed as official specifications until they become more compatible with one another. The W3C intends to launch a quick examination of the situation to determine whether or not there is room for convergence of these technologies.

To those that are not following this stuff closely, it can be difficult to understand all of the technical reasons this issue has been raised. This post attempts to clarify those technical issues by providing an easy-to-read list of similarities and differences between RDFa, Microdata and Microformats. A simple table summarizing all features across each structured data syntax is listed below. Each feature is linked to a brief explanation of the feature toward the bottom of the page.

Thanks to Jeni Tennison for doing a separate technical analysis for the W3C TAG. This article builds upon her hard work, but has been heavily modified and thus should not be considered as her thoughts on the matter. Writing this article was a fairly large undertaking and there are bound to be issues with parts of the article. Please let me know if there are errors by commenting on the post and I will do my best to fix them and clarify when necessary.

Structured Data in a Nutshell

Note: This post frequently uses the term IRI. For those not familiar with the term IRI, it means “Internationalized Resource Identifier” which is basically a fancy way of saying “a URL that allows western language characters as well as characters from any language in the world, such as Arabic, Japanese Katakana, Chinese ideograms, etc”. The URL in the location bar in your browser is a valid IRI.

Feature RDFa 1.1 Microdata 1.0 Microformats 1.0
Relative Complexity High Medium Low
Data Model Graph Tree Tree
Item optionally identified by IRI Yes Yes No
Item type optionally specified by IRI Yes Yes No
Item properties specified by IRI Yes Yes No
Multiple objects per page Yes Yes Yes
Overlapping objects Yes Yes No
Plain Text properties Yes Yes Yes
IRI properties Yes Yes* No
Typed Literal properties Yes No No
XML Literal properties Yes No No
Language tagging Yes Yes Inconsistent
Override text and IRI content Yes No Text only
Clear mapping to RDF Yes Problematic No
Target Languages 8
(XHTML1, HTML4, HTML5, XHTML5, XML, SVG, ePub, OpenDocument)
2
(HTML5, XHTML5)
4
(XHTML1, HTML4, HTML5, XHTML5)
New Attributes 8
about, datatype, profile, prefix, property, resource, typeof, vocab
5
itemid, itemprop, itemref, itemscope, itemtype
0
Re-used Attributes 5
content, href, rel, rev, src
5
content, src, href, data, datetime
4
class, title, rel, href
Multiple IRI types per object Yes RDF only No
Multiple statements per element Yes No Yes
“Locally scoped” vocabulary terms Yes, via vocab Yes, via itemscope No
Item Chaining Yes Basic No
Transclusion No Yes Yes, via include pattern
Compact IRIs Yes No No
Prefix rebinding Yes No No
Vocabulary Mashups Yes No No
HTML5 time element support Not yet Yes No
Different attributes for different property types Yes
property for text, rel/rev for URLs, resource/content for overrides
No Yes
class for text and rel for URLs
Transform to JSON Yes (RDFa API) Yes (Parser and Microdata DOM API) No
DOM API Yes Yes No
Unified Parser Yes Yes No

Relative Complexity

Relative Complexity is a fuzzy measure of how difficult it is to achieve mastery of a particular structured data syntax. Microformats is by far the easiest to pick up and use. Microdata is a big step up and a bit more complex. RDFa is the most complex to master. There are design trade-offs, the simpler the syntax, the fewer structured data markup scenarios are supported. The more complex the syntax, the more structured data markup scenarios that are supported, but at the cost of making it more difficult for Web developers to master the syntax.

Data Model

The Web is a graph of information. There are nodes (web pages) and edges (links) that connect all of the information together. RDFa uses a graph to model the Web. Microdata and Microformats use a special subset of a graph called a rooted graph, or tree. There are benefits and drawbacks to each approach.

Item optionally identified by IRI

Being able to identify an item on the Web is very useful. If we weren’t able to identify web pages in a universal way, the Web wouldn’t exist as it does today. That is, we couldn’t send someone a link, have them open it and find the same information that we found. The same concept applies to “things” described in Web pages. If we identify these things with IRIs, it becomes easier to be specific about the “thing” we’re talking about.

RDFa example:

<div about="example.com/people#manu">...

Microdata example:

<div itemscope itemtype="example.com/types/Person" itemid="example.com/people#manu">...

Microformats example:

Not supported

Item type optionally specified by IRI

The ability to identify the type of an item on the Web is useful. In Object Oriented Programming (OOP) parlance, this is the concept of a Class. Using an IRI to specify the type of an item lets us universally identify that type on the Web. Instead of a machine having to guess whether an item of type “Person” specified on a Web page is the same type that is familiar to it, we can instead give the item a type of example.org/types/Person. Giving the item an IRI type allows us to be sure that two machines are using the same type information.

RDFa example:

<div typeof="example.com/types/Person">...

Microdata example:

<div itemscope itemtype="example.com/types/Person">...

Microformats example:

Not supported

Item properties specified by IRI

The ability to identify a property, also known as a vocabulary term, associated with an item on the Web is useful. In Object Oriented Programming (OOP) parlance, this is the concept of a member variable. Using an IRI to specify the property of an item lets us universally identify that property on the Web. Instead of a machine having to guess whether a property of type “name” specified on a Web page is the same property that is familiar to it, we can instead refer to the property using an IRI, like example.org/terms/name. Giving the property an IRI allows us to be sure that two machines are using the same vocabulary term in a program.

RDFa example:

<span property="example.org/terms/name">Manu Sporny</span>

Microdata example:

<span itemprop="example.org/terms/name">Manu Sporny</span>

Microformats example:

Not supported

Multiple objects per page

Web pages often describe multiple “things” on a page. The ability to express this information as structured data is a natural extension of a Web page.

RDFa example:

<div about="#person1">...</div>
...
<div about="#person2">...</div>

Microdata example:

<div itemscope itemtype="example.com/types/Person" itemid="#person1">...</div>
...
<div itemscope itemtype="example.com/types/Person" itemid="#person2">...</div>

Microformats example:

<div>.../div>
...
<div>.../div>

Overlapping objects

At times, the HTML markup on a page will contain two pieces of overlapping information. For example, two people may be marked up on a web page. Ensuring that the structured data syntax is able to specify which person is being described by the HTML is important because the syntax should not force a Web developer to change the layout of their page.

RDFa example:

<div about="#person1">... Information about Person 1 ...
   <div about="#person2">...</div> ... Information about Person 2 ...
</div>

Microdata example:

<div itemscope itemtype="example.com/types/Person" itemid="#person1">
   ... Information about Person 1 ...
   <div itemscope itemtype="example.com/types/Person" itemid="#person2">...</div>
      ... Information about Person 2 ...
</div>

Microformats example:

Not supported

Plain Text properties

Most item attributes, such as a person’s name, can be expressed using plain text. It is important that these text attributes can be picked up from the page.

RDFa example:

<span property="name">Manu Sporny</span>

Microdata example:

<span itemprop="name">Manu Sporny</span>

Microformats example:

<span>Manu Sporny</span>

IRI properties

At times it is important to differentiate between an IRI and plain text. For example, the text string sip:msporny@digitalbazaar.com could be a text string or it could be a valid IRI. While the ability to differentiate may seem trivial, guessing what a valid IRI is and isn’t will never be future proof. It is helpful to be able to understand if a value is a piece of text or an IRI in the data model.

RDFa example:

<a rel="license" class="creativecommons.org/licenses/by-sa/3.0/">CC-AT-SA-3.0</a>

While Microdata does allow one to differentiate between IRIs and strings in the syntax, the JSON-based serialization converts all IRIs to string values. This is problematic because it is impossible to differentiate between a string that looks like and IRI and an actual IRI in the JSON serialization. IRI properties are preserved correctly in the RDF serialization of Microdata.

Microdata example:

<a itemprop="license" class="creativecommons.org/licenses/by-sa/3.0/">CC-AT-SA-3.0</a>

While Microformats allow you to use IRI information, there is no official data model or mapping to RDF or JSON. Everything is treated as a text string and application logic must be written to determine if a particular data item is meant to be an IRI or text. So, while the markup below is valid – the IRI will be expressed as a text string, not an IRI.

Microformats example:

<a rel="license" class="creativecommons.org/licenses/by-sa/3.0/">CC-AT-SA-3.0</a>

Typed Literal properties

Typed literals allow you to express typing information about a property. This is important when you need to specify things like units of measure, or specific kinds of numbers, in a way that doesn’t depend on understanding the language in the unit of measure. For example: Is “+353872206327″ an integer or a phone number? Is “.1E-1″ a float or a text string? Is “false” a boolean value or a part of a sentence? Another example concerns measurements like the kilogram, a unit of weight measurement that can be displayed in a variety of different ways around the world. Being able to express this unit of measurement in structured data in a language-neutral and measurement-neutral way makes it easier for machines to understand the unit of measurement without having to understand the underlying language.

<span property="measure:weight" datatype="measure:kilograms">40</span> килограммов

Microdata example:

Not supported

Microformats example:

Not supported

XML Literal properties

XML Literals are used for properties that contain markup, such as the content of a blog post, SVG or MathML markup that should be preserved in the final output of the structured data parser. This is useful when you want to preserve all markup.

spacer

The formula above is expressed like so in RDFa and MathML:

<span property="math:formula" datatype="rdf:XMLLiteral">
<math mode="display" xmlns="www.w3.org/1998/Math/MathML">
  <mrow>
    <mi>x</mi>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mo form="prefix">−<!-- − --></mo>
        <mi>b</mi>
        <mo>±<!-- ± --></mo>
        <msqrt>
          <msup>
            <mi>b</mi>
            <mn>2</mn>
          </msup>
          <mo>−<!-- − --></mo>
          <mn>4</mn>
          <mo>⁢<!-- &InvisibleTimes; --></mo>
          <mi>a</mi>
          <mo>⁢<!-- &InvisibleTimes; --></mo>
          <mi>c</mi>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mo>⁢<!-- &InvisibleTimes; --></mo>
        <mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>
</math>
</span>

Microdata example:

Not supported

Microformats example:

Not supported

Language tagging

The ability to specify language information for plain text is important when pulling data in from the Web. At times, words that are spelled the same in western character sets can mean different things. For example, the word “chat” in English (to have a conversation) is a very different meaning from the word “chat” (cat) in French.

RDFa example:

<span property="name" lang="en">Manu Sporny</span>

Microdata example:

<span itemprop="name" lang="en">Manu Sporny</span>

Language information support is only on a per-microformat basis. Some Microformats do not make any statements about supporting multiple language tags.

Microformats example:

<span lang="en">Manu Sporny</span>

Override text and IRI content

At times, the text content in the page is not what you want the machine to extract when reading the structured data. It is important to have a way to override both the text content, and the URL content in an element.

RDFa example:

<span property="candles" content="14">fourteen</span>
...
<a rel="homepage" class="example.org/short-url"
      resource="example.org/2011/path-to-real-url">My Homepage</a>

Microdata example:

Not supported

Microformats only supports overriding text content in an element.

Microformats example:

<abbr property="candles" title="14">fourteen</abbr>

Clear mapping to RDF

The Resource Description Framework, or RDF, has been the standard model for the Semantic Web for over a decade. At times it can be overkill for simple structured data projects, but there are many times where it is necessary for some of the more involved to advanced structured data use cases. There is a fairly large, well-developed set of tools for RDF. It is beneficial if the structured data mechanism has a clear way of mapping the syntax to the RDF data model in a way that is useful to the set of existing RDF processing tools.

Since RDFa is built on RDF, the mapping to RDF is well specified. While it is possible to map Microformats to RDF, there is no standard way of doing so. Microdata does map to RDF, but there are a few bugs that are of concern. Namely, Microdata auto-generates RDF property URLs in a way that is not useful to many of the existing RDF processing tools. The issues that have raised objections in the past relate to the usefulness of/centralization of/dereferenceability of the generated IRIs. It has been argued that the IRIs designated for properties in Microdata are problematic as-is and need to be changed. The following example demonstrates how properties in RDFa map to easy-to-understand URLs:

<section vocab="schema.org/" typeof="Person">
   <h1 property="name">John Doe</h1>
</section>

which results in the following IRI for the “name” property in RDFa:

schema.org/name

This URI is not centrally controlled. It fits in well with the RDF stack. De-referencing the URI leads to a location that is under the vocabulary maintainers control. The Microdata mapping to RDF is a bit less straightforward:

<section itemscope itemtype="schema.org/Person">
   <h1 itemprop="name">John Doe</h1>
</section>

The following URI is generated for the “name” property in Microdata:

www.w3.org/1999/xhtml/microdata#http%3A%2F%2Fschema.org%2FPerson%23%3Aname

This URI is centrally controlled. It requires extensive mapping to be useful for most RDF stacks. De-referencing the URI leads to a location not under the vocabulary maintainers control.

Target Languages

Most structured data languages are meant to express data in a variety of different languages. RDFa is designed and is officially specified to work in a variety of different languages including HTML5, XHTML1, HTML4, SVG, ePub and OpenOffice Document Format. Microdata was built and specified for HTML5. Microformats re-uses attributes in HTML that have been in use for over a decade.

Having a structured data syntax support as many Web document formats as possible is good for the web because it reduces the tooling necessary to support structured data on the Web.

New Attributes

The complexity of a structured data syntax can be viewed, in part, by how many attributes a Web developer needs to understand to properly use the language. New attributes, while providing new functionality, do increase the cognitive load on the Web developer.

Re-used Attributes

All of the structured data languages re-use a subset of attributes that contain information important to structured data on the Web. There is a delicate balance between re-using too many attributes and creating new attributes.

Multiple IRI types per item

Web developers need to be able to specify that an item on a page is associated with more than one type. That is, a business can be both an “AutoPartsStore” and a “RepairShop”.

RDFa example:

<div typeof="AutoPartsStore RepairShop">...

In Microdata, you can only express multiple types for a single object using itemid to tie the information together and then only see the result in the RDF output. The DOM API would generate two separate items for the markup below, while the RDF output would generate only one item.

Microdata example:

<div itemscope itemid="#fixit" itemtype="example.com/types/AutoPartsStore">...</div>
<meta itemscope itemid="#fixit" itemtype="example.com/types/RepairShop" />

Microformats example:

Not supported

Multiple statements per element

It is advantageous to use as much of the existing information in an HTML document as possible. At times, one element can contain more than a single piece of structured data. For example, a link can contain both the name of a person as well as a link to their homepage. A structured data syntax should re-use as much of this information as possible.

RDFa example:

<a rel="homepage" class="manu.sporny.org/" property="name">Manu Sporny</a>

Microdata example:

Not supported

Microformats example:

<a rel="homepage" class="manu.sporny.org/">Manu Sporny</a>

“Locally scoped” vocabulary terms

Locally scoped vocabulary terms allow you to create new vocabulary terms on-the-fly that are picked up by the structured data parsers. The use case for this is questionable, as it is considered good practice to have a vocabulary that allows any person or machine to dereference the URL and find out more about the vocabulary term.

RDFa example:

<div vocab="schema.org/" typeof="Person">
   <span property="favoriteSquash">Butternut Squash</a>
</div>

Microdata example:

<div itemscope itemtype="schema.org/Person">
   <span itemprop="favoriteSquash">Butternut Squash</a>
</div>

Microformats example:

Not supported

Item Chaining

Chaining allows the object of a particular statement to become the subject of the next statement. It is often useful when relating multiple items to a single item or when linking multiple items, like social networks, together. For example, “Manu knows Ivan who knows Sandro who knows Mike”.

<div about="#manu" rel="knows">
   <div about="#ivan" rel="knows">
      <div about="#sandro" rel="knows">
         <div about="#mike">
         ...
</div>

Microdata supports basic chaining, but doesn’t support hanging-rels or reverse chaining.

Microdata example:

<div itemscope itemid="#manu" itemtype="schema.org/Person">
   <div itemscope itemid="#ivan" itemprop="knows">
      <div itemscope itemid="#sandro" itemprop="knows">
         <div itemscope itemid="#mike" itemprop="knows">
         </div>
      </div>
   </div>
</div>

It is questionable whether or not Microformats even supports basic chaining. If somebody has a good chaining example for Microformats, please let me know and I’ll put it below.

Microformats example:

No examples of chaining.

Transclusion

Transclusion allows a Web author to specify a set of properties once in a page, such as a business address, and copy those properties to multiple items in a page. RDFa allows doing this by reference, not by making a copy. Microdata allows transclusion both by reference and by copy. Microformats allows transclusion both by reference and by copy.

RDFa example:

Transclusion by copy not supported.

Microdata example:

<span itemscope itemtype="microformats.org/profile/hcard"
      itemref="home"><span itemprop="fn">Jack</span></span>
<span itemscope itemtype="microformats.org/profile/hcard"
      itemref="home"><span itemprop="fn">Jill</span></span>
<span id="home" itemprop="adr" itemscope><span
      itemprop="street-address">Bottom of the Hill</span></span>

Microformats example:

<span>
  <span id="james-hcard-name">
    <span>James</span> <span>Levine</span>
  </span>
</span>
...
<span>
 <object data="#james-hcard-name"></object>
 <span>SimplyHired</span>
 <span>Microformat Brainstormer</span>
</span>

Compact IRIs

Compact IRIs allow Web developers to compress URLs so that they are easier to author. This allows more compact markup and reduces errors because it is no longer necessary to type out full URLs.

RDFa example:

<div prefix="dc: purl.org/dc/terms/">
...
   <span property="dc:title">...
   <span property="dc:creator">...
   <span property="dc:abstract">...
</div>

Microdata example:

Not supported

Microformats example:

Not supported

Prefix rebinding

Enabling prefix declaration and rebinding supports decentralized vocabulary development and management. Prefix rebinding allows Web developers to create vocabularies that are specific to their domain of expertise and use them in a way that is inter-operable with other RDFa processors. Microdata and Microformats do not specify a prefix declaration and rebinding mechanism. Microdata does allow custom vocabularies using the itemtype attribute and therefore does support decentralized vocabulary development, but not decentralized vocabulary management, unless full IRIs are used to express the vocabulary terms.

RDFa example:

<div prefix="dc: purl.org/dc/terms/">
...

Microdata example:

Not supported

Microformats example:

Not supported

Vocabulary Mashups

Enabling multiple Web vocabularies to be mashed together into simple vocabulary terms is useful when creating application specific “vocabulary profiles”. Using a vocabulary profile, these simple vocabulary terms can be re-mapped to full vocabulary term IRIs which is useful to Web developers that need to simplify markup for a particular business unit, but ensure that the data generated maps to the correct Web vocabularies when used on the open Web.

For example, assume that a Web developer wants to map the vocabulary term “name” to “schema.org/name”, and “nickname” to “xmlns.com/0.1/foaf/nick”, and “hangout” to “example.com/myvocab#homebase”. These mappings could be accomplished in a simple-to-use vocabulary profile like so:

RDFa example:

<div profile="example.com/my-rdfa-profile">
...
   <span property="name">...
   <span property="nickname">...
   <span property="hangout">...
</div>

Microdata example:

Not supported

Microformats example:

Not supported

HTML5 time element support

There is a new element in HTML5 called time. This element is used to express human-readable dates and times and also contains a machine-readable value. This element was created as a response to the difficulty that the Microdata community was having when marking up dates and times. The only specification that makes use of the element currently is the Microdata specification. However, there is currently an issue logged against HTML5+RDFa that requests the inclusion of this element so that RDFa processors may understand it. Microformats do not use this element yet, partly because it does not exist in HTML4.

RDFa example:

Not supported

Microdata example:

<time datetime="2011-06-25" pubdate>June 25th 2011</time>

Microformats example:

Not supported

Different attributes for different property types

There is a design trade-off in structured data languages. As the number of statements that a single element can express increases, so does the number of attributes used to express statements. As the number of ways that an element’s value can be overridden increases, so does the number of attributes used to perform the override. Microdata keeps things simple by allowing only one statement to be made per element. Microformats allows class for text, rel for IRIs and title to override text content. RDFa uses the property attribute for text, rel and rev to specify URLs, and resource and content to override IRI and text content, respectively.

Transform to JSON

JSON is a heavily used data transport format used on the Web. It fits nicely into programming environments, so it is beneficial if a structured data syntax can be easily transformed into JSON. Microdata has a native mapping from the parser output to JSON, as well as a DOM API that allows items to be retrieved from the page. The RDFa API provides a mechanism to retrieve data from a page and then serialize that data to JSON.

DOM API

The ability to extract and utilize structured data from a web page in a browser setting is useful for improving interfaces and interactive applications. Microdata provides a simple Microdata DOM API for retrieving items from a web page. RDFa provides a more comprehensive RDFa DOM API for retrieving structured data from a web page. Microformats do not provide an API for extracting structured data from a web page.

Unified Parser

Having a solid set of tooling for handling structured data is important. One of the most important set of tooling are the parsers that are able to process Web documents and extract structured data from those web documents. Both RDFa and Microdata have a unified parser specification, which makes it easier to create inter-operable tools. Microformats require that separate parsers are created for each data format. This may change with the Microformats 2 work, but for now, there is no unified parser specification for Microformats.

Closing

This document will be updated as errors or omissions are found. It can be considered an up-to-date comparison between RDFa, Microdata and Microformats as of June 2011. A follow-up blog post will explain how these structured data languages could be combined into a single structured data language for the Web, achieving the W3C TAG’s goal for unification of the syntaxes used to express structured data on the Web.

44 Comments

Got something to say? Feel free, I want to hear from you! Leave a Comment

  1. spacer
    Henri Sivonen says:
    June 26, 2011 at 7:27 am · Reply

    If having a feature is generally green even for misfeatures like Compact URIs, why isn’t more New Attributes greener?

    • spacer
      ManuSporny says: (Author)
      June 26, 2011 at 11:36 am · Reply

      As with all “Feature comparison charts” there are nuances that are lost when color coding whether a feature is “good” or “bad”. The general feeling that I get, and I realize that this “feeling” is skirting very close to over-generalization, is that the number of features are a double-edged sword. There is a spectrum of opinions on each of the features – some think the feature is unnecessary or harmful, some feel the feature is necessary but problematic, some love the feature, and some don’t care. Just because a technol

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.