Microformats vs RDFa vs Microdata
August 23, 2009 by Philip
Warning: The microdata syntax has changed (e.g.
item="foo" is now
itemscope itemtype="foo") since this blog post was written. Don’t copy the examples.
The three candidates were microformats, microdata and RDFa. We began with plain HTML:
<p> I'm Philip Jägenstedt at <a class="foolip.org/">foolip.org</a>. </p>
The simple task at hand is to make my name and homepage machine-readable using each of these formats. What follows is a more elaborate version of the reasoning we went through while evaluating the strengths and weaknesses of each alternative.
I'mPhilip Jägenstedt at <a class="foolip.org/">foolip.org</a>. </p>
Microformats are “a set of simple, open data formats”, i.e. predefined vocabularies under centralized control. In this example I’ve used the hCard microformat. One “feature” of microformats is that it is valid HTML 4.01/XHTML 1.0, which is why the
class attribute is used in novel ways. Although HTML 4.01 mentions that
class may be used “for general purpose processing by user agents” it’s normally only used “as a style sheet selector”, i.e. for CSS. What this means is that we are working in a single global namespace which is already polluted with all the CSS class names ever used.
The only thing that distinguishes microformats from random CSS classes is the tree structure. This structure is quite a limitation though, because it means that you have to find or make a common ancestor element to all of the data in a single hCard. For a data interchange format, it all seems insane and simply too brittle. Emil put it rather bluntly when he tweeted:
Microformats. Pile of shite that just increases our systematic technical debt.
Still, I have great respect for some of the people behind microformats and the down-to-earth philosophy. They openly state that microformats aren’t “infinitely extensible and open-ended” or “a panacea for all taxonomies, ontologies, and other such abstractions”. As microformats was never intended to solve our use case it is no surprise that it really doesn’t.
Certainly anyone can use
class="foo" to mean anything they like without going through the microformats process – such data formats are cleverly called poshformats (Plain Old Semantic HTML). All things considered though, the whole approach seems outdated and I hope it won’t still be around 5 years from now. Microformats has shown the need for HTML-embedded machine-readable data, now let’s find a better solution.
To understand RDFa you first need some understanding of RDF. The RDF model is basically a somewhat roundabout way of describing graphs using subject-predicate-object triples. An example is the best way to illustrate:
This graph represents me, my name, my homepage and the relationships between them. I’m using the FOAF vocabulary because it already has the concepts of “name” and “homepage”. In N3 syntax this corresponds to these two triples:
@prefix foaf: <xmlns.com/foaf/0.1/> . <#me> <foaf:name> "Philip Jägenstedt" . <#me> <foaf:homepage> <foolip.org> .
<brackets> is a URI and because URIs tend be long prefixes are used:
<foaf:name> actually means
<xmlns.com/foaf/0.1/name>. I’ve used
#me to represent myself, but this should really be resolved to a full URI.
As you can see, the subject is
#me in both statements. The relationships in the graph are the predicates, i.e.
foaf:homepage. The object is either another URI or a string literal. Adding RDF triples equates to adding more nodes and relationships to the graph. This is general enough that you can model almost anything you want with it.
Back to RDFa. The “a” refers to how attributes in XHTML are used to serialize RDF:
I'mPhilip Jägenstedt at <a class="foolip.org/">foolip.org</a>. </p>
The use of XML namespaces here is a bit odd. Prefixes in XML are used on element and attribute names, but here it’s only used in the attribute value. These are actually CURIEs, another URL shortening scheme. Jeni Tennison recently wrote an excellent post about the use of prefixes in RDFa which I encourage everyone to read. I also chatted briefly with Henri Sivonen about the problems with
xmlns and would recommend reading his mails on those issues.
If we return to RDFa syntax for a bit, notice how
rev are used for the exact same purpose (setting the predicate) in different contexts. The intention was probably to mimic existing practices such as rel=”next”, but the net result is just more room for confusion. While I won’t claim that it’s just too hard I certainly think it could have been simpler without loosing much expressive power.
RDFa began in the now discontinued XHTML2 WG and seems strongly rooted in the Semantic Web (now Linked Data) community and that stack of technologies and tools. It was later made into a module for XHTML 1.1, but there is no W3C-sanctioned way of embedding RDFa in plain HTML. Getting into HTML5 would guarantee RDFa’s survival in the web ecosystem, so its proponents approached the WHATWG/HTML WG suggesting that RDFa be included. There was much heated discussion, the drama of which was my sole source of entertainment for weeks at a time. I’ll again refer to Jeni’s summary of the clash of priorities and “fruitless discussion”. I particularly want to emphasize this conclusion:
It’s just not going to happen for HTML5
I don’t hate RDF(a). I can certainly see the appeal of the RDF model after taking the time to understand it. It may just be a very verbose way of describing graphs, but as a data interchange format it seems to do a good job. However, being able to express arbitrary RDF in HTML in a compact way is not an actual use case for most web developers. If it’s possible without added complexity that’s fine, but HTML is not a triplestore.
As a result of gathering use cases and other input from the big RDFa discussion, suddenly one day HTML5 microdata section sprung into existence along with a very long announcement to the WHATWG list from Ian Hickson (our editor). Within 3 hours there was a demo and not long after another. This is it:
I'mPhilip Jägenstedt at <a class="foolip.org/">foolip.org</a>. </p>
This looks very similar to the microformats example, but the new
itemprop attributes are used instead of
class. The model used is nested groups of name-value pairs, where the name-value pairs are given by the elements with
There are some predefined item types (used above), but it’s possible to use either URLs (
foolip.org/footype) or reversed DNS identifiers (
org.foolip.footype) to define your own types without any risk of namespace pollution. Note however that there are no prefixes or other URL shortening schemes. I don’t think I’m crazy to suggest that services like bit.ly and tr.im have shown a way out of the “long URL” problem. If microdata gains any traction, I think communities will create vocabularies with clever shorthands like
subject attribute can be used to avoid the “common ancestor” problem we had with microformats by simply referring to the item element by id:
I'm Philip Jägenstedt. </p> <!-- stuff --> <a class="foolip.org/">foolip.org</a>.
Microdata is quite straightforward and feels much more native to HTML than RDFa. As Jeni explains, microdata can’t express RDF triples using datatypes or XML literals. I’ll also add that using a blank node as object isn’t possible. Other than that, RDF triples can be expressed by using the
about type to give the subject of the name-value (predicate-object) pair. Here’s my FOAF example from earlier:
<p > I'm Philip Jägenstedt at <a class="foolip.org/">foolip.org</a>. </p>
It is quite ugly, so if there’s any way to make it simpler I’m sure such suggestions are welcome. In general though, it seems like a better idea to use simple microdata structures and map that against a RDF vocabulary if possible. In fact, the spec already defines how to extract some RDF (and JSON) from microdata so I’m sure it’s not difficult to do.
var props = document.getItems("vcard").properties; var fn = props.namedItem("fn").content; var url = props.namedItem("url").content; alert("Name: " + fn + "; URL: " + url);
Unsurprisingly there are some issues with the API which I’ve sent feedback on and expect to be fixed to my satisfaction eventually, but the basic functionality is sound. I imagine scripts making dynamic pie charts from tables, providing page-specific autocomplete suggestions and making shiny animated SVG visualizations of the RDF graphs hidden in the tag soup…
Google is now offering to do usability testing of the microdata syntax to see if it can be improved, so if you have any suggestions be sure to bring those to the WHATWG now.
The examples I’ve used are overly simplistic and may utterly fail to show the strengths and weaknesses of each syntax. Still, this is my best effort to make sense of the issues at hand and I haven’t intentionally misrepresented any technology or community. I assume that there is much more debate to come before the dust settles on this issue and perhaps I’ll even change my mind after experimenting more with real-world implementation. I leave you with this unambiguous summary of my views:
- Microformats, you’re a
- RDFa, HTML is not your triplestore
- Microdata, I like you but you need more review
- Shelley Powers wrote about RDFa and HTML5′s microdata from the perspective of the RDFa/Semantic Web community. It’s quite a different view from mine, so read that before believing my propaganda.
- Following James Graham’s suggestion, I have registered mantic.se for fun reverse DNS identifiers like
se.mantic.banana. Mostly for fun, don’t take it too seriously…
- I misunderstood Jeni’s post about expressing RDF in microdata and have fixed that section to be more accurate.
Disclaimer: this post is the result of excess spare time and not part of my work at Opera Software. I know nothing about Opera’s plans (or lack thereof) for microformats, RDFa or microdata.
Posted in Nonsense | Tagged html5, microdata, microformats, rdfa | 18 Comments
on August 24, 2009 at 13:13 | Reply Jeni Tennison
You say “As Jeni explains, it is actually possible to express any RDF triple (except those using a blank node as object!) by using the about type to give the subject of the name-value (predicate-object) pair.”
Actually, as I tried to explain in the post you referenced, there are two other RDF-type things you can’t express in Microdata: a datatyped literal and an XML literal. So you can’t express the triple:
<foolip.org/> dc:modified "2009-08-24"^^xsd:date
Nor can you create a triple whose object is an XML structure, such as a snippet of XHTML (the body of a blog post, say).
on August 24, 2009 at 14:16 | Reply Philip
You’re right of course Jeni. I read that post when you first published it, but the second time around I misunderstood the first part of your post to mean quite the opposite of the overall message. I’ve updated the post to not misrepresent you.
On the actual subject matter, one might consider syntax like
itemprop="date<org.foolip.modified>"or something like that. You’d have to make sure that it’s unambiguous if it’s a URI, reversed DNS or other token and define error handling when string can’t be parsed as the given type, of course.
on August 25, 2009 at 23:59 | Reply Kevin Marks
If all you want is an HTML dump of a JSON-like data structure, use XOXO – it defines a simple mapping of lists and dicts to HTML lists and definition lists, with some special cases for URLs.
Using class attributes for structure isn’t a kludge; it’s what they were defined for. Microformats focus on agreeing common ones.
on September 3, 2009 at 13:40 | Reply Bruce Lawson’s personal site : This millenium in HTML 5 (politics)
on September 22, 2009 at 14:02 | Reply Structured data : implementing RDFas and Microformats in web pages « AmyVarga's Blog
[...] P. 23rd August 2009Microformats vs RDFa vs Microdata [Online]. Available: Microformats vs RDFa vs Microdata [22nd September [...]
on December 15, 2009 at 05:10 | Reply Nic
wow. u know a lot about chinese culture. I’m from China and I love Sweden.
I know several Swedish friends and they are so cute~
on January 4, 2010 at 14:31 | Reply RDFa vs Microformat « Reseatch Items
[...] Why RDFa will fail against Microfomats: blog.foolip.org/2009/08/23/microformats-vs-rdfa-vs-microdata/ [...]
on June 24, 2010 at 09:26 | Reply Joe Watts
Is it not possable to use all 3 formats together?
<p class="vcard" item="vcard" xmlns:foaf="xmlns.com/foaf/0.1/" about="#me"> I'm <span class="fn" itemprop="fn" property="foaf:name">Philip Jägenstedt</span> at <a class="url" itemprop="url" rel="foaf:homepage" class="foolip.org/">foolip.org</a>. </p>
on June 24, 2010 at 14:18 | Reply Philip
It’s possible, but ridiculously verbose, don’t you think? I’d also be surprised if it’s valid under any validator settings, as RDFa is opt-in and I doubt anyone writing such a validator would also include Microdata.
on July 23, 2010 at 17:36 | Reply Are Microformats worth the effort for SEO?
[...] There are better alternatives [...]
on August 5, 2010 at 19:12 | Reply Resources from last night’s PDX SemWeb talk | Ghosted Notes
[...] Dive Into HTML5 on Microdata Maxwell’s Silver Hammer: RDFa and HTML5’s Microdata Microformats vs. RDFa vs. Microdata [...]
on May 9, 2011 at 16:48 | Reply Quora
What is RDF and what is an example use case?…
RDF is one form of microdata to label and annotate the data. Without knowing specifically what your system are, I’d advise you to research different kinds of machine-readable data and decide which is most appropriate for you. An example that recently …
on June 12, 2011 at 08:02 | Reply danbri
So I only just found this post, via your nice work on the parser.
Interesting re the Neo4J connection – did you ever try Gremlin / Tinkerpop tools?
I wrote up some recent experiments – danbri.org/words/2011/05/10/675 – it’s quite a nice way to interact with this kind of data. If we can get graph structures from microdata, that should just plug right in too…
on June 12, 2011 at 08:19 | Reply Philip
Thanks, I hadn’t seen Gremlin before. Getting RDF graphs out from microdata is already possible as you know, as long as you don’t want the predicate URI to be pretty
on June 12, 2011 at 08:25 | Reply danbri
well there’s two kinds of pretty.
1) de-reference to something useful, eg. annotations that express basic mappings between terms, or say which properties are functional, inverse functional etc. Or multilingual labels.
2) be really nice short URIs without loads of unnecessary clutter.
For (1) I’d prefer microdata’s rdf mapping to use decentralised URIs directly, not point them all off at W3C. Or maybe it could be an argument to parsers? For (2) I suspect the RDF community might need to save up its pocket money and buy some nice short domain names. I’m considering using foaf.tv/Person as an alias for xmlns.com/foaf/0.1/Person for example…
on June 12, 2011 at 09:19 | Reply Philip
If you have an idea for how to “use decentralised URIs”, I’m sure Hixie would appreciate it, see www.w3.org/Bugs/Public/show_bug.cgi?id=12713
I also expect that we’ll see more short URLs used for microdata, as already mentioned in the article. In fact, I’m surprised schema.org went with something as verbose as they did, instead of, say, ty.pe/Person…
on June 12, 2011 at 17:18 | Reply danbri
Maybe we can make some practical use case around schema.org, … eg. including translation and mapping data in those pages. Will investigate…
on February 17, 2012 at 17:13 | Reply Jeffrey B