I'm a big fan of inline structured data like RDFa because it makes it easy to share information between sites. For example, if my research institute listed my speaking appearances with RDFa, I could just use Views to display those presentations on my site. Instead of getting the list from my MySQL database, Views would use the external Web page as its "database".
I'm also a big fan of HTML5. It is a big step towards a more exciting, more usable, and more developer friendly Web.
Drupal has made commitments to both, with its support for RDFa in core in Drupal 7 and its move towards HTML 5 output in Drupal 8. That's why I wanted to give the RDFa and HTML 5 compatibility a test drive, starting with my own site.
To start with... a base theme
There are a number of base themes for Drupal 7 that already output HTML 5, and other prominent base themes that are moving towards HTML 5 support. For the most part, this support takes the form of using new elements like <article>
and <aside>
.
I strongly believe that it is best to use a base theme... people like John Albin have been thinking way more about markup best practices than I have. Because a lot of people are using base themes like Zen, those themes also benefit from the collective intelligence of the various Drupal communities, making the markup even better... and I hope to contribute to that collective intelligence by helping review the structured data these themes output.
For this project, I started with Jeff Burnz's AdaptiveTheme. Jeff puts a note in his theme:
Due to the ongoing specification changes for RDFa in HTML5 the doctype and version information may change in a point release, or not, depending. Right now things are working (afaict - I am no RDFa expert).
So I figured I'd give a quick check to see if it is working. I ran an article through Sindice's Inspector and clicked on the graph tab. Everything looked good! This doesn't mean that the markup is valid, but it does mean that all of the data is structured as I want it to be, which is a great first step.
The graph is pretty complicated for an article with lots of comments, so I figured I'd show an example of what the basic page looks like, instead. You can also test this out in the Inspector.
What this theme does
Even though RDFa support is baked into Drupal core, that doesn't mean all themes are RDFa compatible. This theme does a couple of things right that can serve as good pointers for other themes, and raises a couple of issues that we should work on figuring out as a community.
The doctype
Whether or not to use a doctype with RDFa was up in the air for a while. It wasn't included in prior versions of the specification, but starting with the June, 2010 version of the spec, doctype is included as something that you MAY include for validation purposes.
The doctype in the theme doesn't match the doctype in the spec. I'm guessing that this is because the doctype in the spec is somewhat confusing, saying that the document is HTML 4.01 for validation purposes.
There is a postponed issue in the AT queue dealing with the doctype. As the doctype is only necessary for validation, and this document will not validate with the doctype that is in the spec, I have recommended just using the plain HTML 5 doctype.
But just in case there does end up being a custom doctype for RDFa in the final recommendation, lets see what AT does. To add the doctype, Jeff uses a preprocess function to add variables before html.tpl.php is output. Those variables are then used in the html.tpl.php template. If the RDF module is enabled, the doctype and other variables get set with RDF specific values and passed to the html template. Otherwise, the default HTML doctype is used.
function adaptivetheme_preprocess_html(&$vars) { if (module_exists('rdf')) { $vars['doctype'] = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML+RDFa 1.1//EN">' . "\n"; $vars['rdf_version'] = ' version="HTML+RDFa 1.1"'; $vars['rdf_profile'] = ' profile="' . $vars['grddl_profile'] . '"'; } else { $vars['doctype'] = '<!DOCTYPE html>' . "\n"; $vars['rdf_version'] = ''; $vars['rdf_profile'] = ''; }
<?php print $doctype; ?> <html lang="<?php print $language->language; >" dir="<?php print $language->dir; ?>"<?php print $rdf_version . $rdf_namespaces; ?>> <head<?php print $rdf_profile; ?>>
This might not be the optimal way to set a doctype, but it would require a change to Drupal core to do anything better—in the default html.tpl.php, the doctype is hard-coded.
One minor problem is the version
attribute—in HTML 5 it is deprecated, but the HTML+RDFa spec layers it back in (a weird thing that all of the specs on top of HTML can do). The version attribute isn't required, it simply gives guidance to RDFa consumers, so I would recommend not using it.
RDF namespaces
Both RDFa and much of microdata use urls (also called URIs) to reference things. This makes it easy to combine information from different sites.
In microdata, the full url has to be used. In RDFa, there is a shortcut called a prefix that can be used.
For example, if I want to talk about myself, I might use the url lin-clark.com/user/lin. Using a prefix, I can shorten this to lc:lin. However, I have to tell the computer what "lc" stands for, so I add a namespace mapping to the html tag of the document. In this case, it would be xmlns:lc="lin-clark.com/user/"
.
Drupal core automatically creates a variable for the namespace mappings. You might notice that it was included in the above snippet, but I'll include it again here, simplified:
<html <?php print $rdf_namespaces; ?>>
RDFa for fields
RDF is an entity-attribute-value data model. That means that an RDF representation of something is basically a series of really simple sentences about that thing. If I was describing myself, statements would look like this:
Entity | Attribute | Value | |
---|---|---|---|
1 | lin | name | Lin Clark |
2 | lin | interest | data interoperability |
3 | lin | interest | HTML 5 |
4 | lin | website | lin-clark.com |
Fortunately, with the new Field API in core, Drupal's template structure matches this E-A-V model very well. There is a surrounding entity template (such as node.tpl.php or user-profile.tpl.php) with field templates nested inside of it. The field template receives the field values from the field formatter and places each value in its own wrapping div.
Because of this, you can get the proper markup for Field API fields simply by including a few attributes variables. You only need to include these if you are overriding templates in your theme, they are included by default in core templates.
- Entity
- $attributes should be placed in the entity wrapper in ENTITY.tpl.php (i.e. user-profile.tpl.php). This will give you the
about
attribute and thetypeof
attribute. - Attribute
- $item_attributes[$delta] should be placed in the element that wraps the field value. This will place the
property
orrel
attribute. This also places supporting attributes for the value,content
anddatatype
, if needed. - Value
-
The HTML rendered with
render($item)
above contains the value. This works for core fields, but your mileage may vary with contributed field formatters. If a field formatter puts extraneous HTML directly into the field value, this gets in the way of getting a clean value.One way to get around this is to use the helper attributes
content
andresource
. If you maintain a contrib field formatter, feel free to ping me on IRC to check the RDFa for your values, or check with people in #drupal-rdf.
RDFa for non-fields
RDFa for non-field values is much more complicated. Core does some RDFa output of non-field variables in particular instances. Examples of non-fields are the node author and date on nodes.
The one you need to know about is the title, which uses another attributes array, $title_attributes.
<?php if ($title): ?> <h1<?php print $title_attributes; ?>> <a class="<?php print $node_url; ?>" rel="bookmark"><?php print $title; ?></a> </h1> <?php endif; ?>
Until I'm persuaded otherwise, I think that we should discourage RDFa for non-fields in contrib. The advantage that Drupal has when it comes to RDF is the micro-templating (entity –> field –> field vaue) that is afforded by the PAC-ish architecture of Drupal. That structure matches the entity-attribute-value nature of RDF. Once you get more complicated than that, the code required gets... well, more complicated.
Entity API includes the concept of properties, which allows you to register non-field variables in your model. I haven't looked in to it yet, but if there is a standard wrapper for properties as there is for fields, then it should be able to use a similar format.
UPDATE: I have started looking into the Entity API properties in microdata. If it makes sense for developers using the API, I will recommend it for RDFa as well. There is an issue in the RDFx queue about this.
Validation
Validation is still tricky. Some validators can handle microdata attributes but not RDFa, some can't handle either. The problem isn't that HTML+RDFa is any less valid than any of the other HMTL 5 drafts, it's that the validators haven't been updated yet.
HTML+RDFa just recently went into last call at the W3C with HTML 5 and microdata, so I believe we will be seeing much more tool support in things like validators soon. When I hear of a well-known validator adding HTML+RDFa support, I'll be sure to broadcast.
The fact remains though... it is unfortunately quite complicated to figure out just how to validate an HTML5+RDFa document at this point. While I've heard a number of suggestions for workarounds, none has worked for me yet.
UPDATE: The W3C has requested that a task force take a look at how the microdata and RDFa specifications can be made more compatible. Once that task force has completed its work (which may be 2 months if everything goes smoothly, longer otherwise), the validation issue should be much more clear.
Feedback?
This will probably turn into a page or five in the handbooks, so please let me know... Anything I missed? Anything that could be clearer? Anything you would like to hear more about?
- drupal-planet
- rdf
- rdfa
- theming
- html-data
Comments
Interesting stuff
Great stuff your blog post seems well informed and a well written post.
I am about to embark on writing a way to import and export data between different sites using views and PHP/MySQL.
In your opinion would this be the way to move forward using RDF and HTML5? or do you think the current technologies have early development issues?
I am happy to have a bit of a head ache to get around issues like the validation now but in the long term have something that is far more flexible.
Thanks,
Interesting, are you planning
Interesting, are you planning on using SPARQL Views?
If so, there are a couple of ways to do it.
Feel free to post a support request in SPARQL Views queue if you intend to use the module to do this and I can help you figure out how to make it work.
I like the idea of this
Hi Lin,
Yes I had looked at SPARQL Views, my main issue is one site is on D6 and one on D7, while i administer all of them the diffrence in the API is likley to cause me issues. I cannot see the D6 site being upgraded to D7 just yet so I may have to go with transferring the data as it is.
I know there was some talk of enabling RDFa in D6 but how far this has gone i have not looked into yet, do you know much about this?
The idea of SPARQL store sounds promosing, I am going to do some testing on this before I decide which way is going to be the best long term option.
If I go with this I would appreciate the support requests thanks for the offer :)
schema.org
How about schema.org?
Is Drupal supporting that protocol?
Thanks
Schema.org is not a protocol,
Schema.org is not a protocol, it is a vocabulary. Vocabularies can be used in either RDFa or microdata.
There is a Drupal module that provides the schema.org vocabulary terms to use with Drupal's native RDFa. However, schema.org terms are not yet being used by the search engines, so you won't see any visualized output in the testing tool yet.
Google Rich Snippets Support
hi Lin:
Thanks so much for your fantastic work on this. It is really exciting what you've done so far.
I'm wondering what your thoughts and / or experience is so far with Google Rich snippets. They support RDFa but couldn't understand the attributes embedded in your test page: www.google.com/webmasters/tools/richsnippets?url=http%3A%2F%2Flin-clark.com%2Frdfa-test-page&view=cse even though they appear to be semantically correct.
Do you have advice for people trying to get Google to try and read RDFa info?
The attributes in the test
The attributes in the test page don't use the Rich Snippets vocabulary. The Rich Snippets vocabulary covers a small number of items, such as Reviews and Events, and should only be used to mark up those kinds of things.
Google's Rich Snippets does understand RDFa that uses its vocabulary. You can find out a little more here: https://www.ibm.com/developerworks/web/library/wa-rdf/
Whats the Best Theme to choose as base for HTML5 and RDFa
Whats the Best Theme to choose as base for HTML5 and RDFa.
I choose a Adaptive and Picture Reloaded. Is any better than this for RDFa and HTML5. Please Advice. i digging the google for month and cant find any right answer.
Adaptivetheme works with RDFa
Adaptivetheme works with RDFa. I've never tried Picture Reloaded.
thanks for the detailed
thanks for the detailed explanation ... I have been been working on the similar project for a client in Drupal...
thank you
Thank you so much for this excellent post, it helped us get our RDFa into our field templates.