Lin Clark

Theming with HTML5 and RDFa in Drupal 7

Submitted by Lin on

I'm a big fan of inline structured data like RDFa because it makes it easy to share information between sites. For example, if my research institute listed my speaking appearances with RDFa, I could just use Views to display those presentations on my site. Instead of getting the list from my MySQL database, Views would use the external Web page as its "database".

I'm also a big fan of HTML5. It is a big step towards a more exciting, more usable, and more developer friendly Web.

Drupal has made commitments to both, with its support for RDFa in core in Drupal 7 and its move towards HTML 5 output in Drupal 8. That's why I wanted to give the RDFa and HTML 5 compatibility a test drive, starting with my own site.

To start with... a base theme

There are a number of base themes for Drupal 7 that already output HTML 5, and other prominent base themes that are moving towards HTML 5 support. For the most part, this support takes the form of using new elements like <article> and <aside>.

I strongly believe that it is best to use a base theme... people like John Albin have been thinking way more about markup best practices than I have. Because a lot of people are using base themes like Zen, those themes also benefit from the collective intelligence of the various Drupal communities, making the markup even better... and I hope to contribute to that collective intelligence by helping review the structured data these themes output.

For this project, I started with Jeff Burnz's AdaptiveTheme. Jeff puts a note in his theme:

Due to the ongoing specification changes for RDFa in HTML5 the doctype and version information may change in a point release, or not, depending. Right now things are working (afaict - I am no RDFa expert).

So I figured I'd give a quick check to see if it is working. I ran an article through Sindice's Inspector and clicked on the graph tab. Everything looked good! This doesn't mean that the markup is valid, but it does mean that all of the data is structured as I want it to be, which is a great first step.

The graph is pretty complicated for an article with lots of comments, so I figured I'd show an example of what the basic page looks like, instead. You can also test this out in the Inspector.

spacer

What this theme does

Even though RDFa support is baked into Drupal core, that doesn't mean all themes are RDFa compatible. This theme does a couple of things right that can serve as good pointers for other themes, and raises a couple of issues that we should work on figuring out as a community.

The doctype

Whether or not to use a doctype with RDFa was up in the air for a while. It wasn't included in prior versions of the specification, but starting with the June, 2010 version of the spec, doctype is included as something that you MAY include for validation purposes.

The doctype in the theme doesn't match the doctype in the spec. I'm guessing that this is because the doctype in the spec is somewhat confusing, saying that the document is HTML 4.01 for validation purposes.

There is a postponed issue in the AT queue dealing with the doctype. As the doctype is only necessary for validation, and this document will not validate with the doctype that is in the spec, I have recommended just using the plain HTML 5 doctype.

But just in case there does end up being a custom doctype for RDFa in the final recommendation, lets see what AT does. To add the doctype, Jeff uses a preprocess function to add variables before html.tpl.php is output. Those variables are then used in the html.tpl.php template. If the RDF module is enabled, the doctype and other variables get set with RDF specific values and passed to the html template. Otherwise, the default HTML doctype is used.

Preprocess function adds variables for html.tpl.php template

function adaptivetheme_preprocess_html(&$vars) {
  if (module_exists('rdf')) {
    $vars['doctype'] = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML+RDFa 1.1//EN">' . "\n";
    $vars['rdf_version'] = ' version="HTML+RDFa 1.1"';
    $vars['rdf_profile'] = ' profile="' . $vars['grddl_profile'] . '"';
  }
  else {
    $vars['doctype'] = '<!DOCTYPE html>' . "\n";
    $vars['rdf_version'] = '';
    $vars['rdf_profile'] = '';
  }


html.tpl.php template prints the variables

<?php print $doctype; ?>
<html lang="<?php print $language->language; >" dir="<?php print $language->dir; ?>"<?php print $rdf_version . $rdf_namespaces; ?>>
<head<?php print $rdf_profile; ?>>

This might not be the optimal way to set a doctype, but it would require a change to Drupal core to do anything better—in the default html.tpl.php, the doctype is hard-coded.

One minor problem is the version attribute—in HTML 5 it is deprecated, but the HTML+RDFa spec layers it back in (a weird thing that all of the specs on top of HTML can do). The version attribute isn't required, it simply gives guidance to RDFa consumers, so I would recommend not using it.

RDF namespaces

Both RDFa and much of microdata use urls (also called URIs) to reference things. This makes it easy to combine information from different sites.

In microdata, the full url has to be used. In RDFa, there is a shortcut called a prefix that can be used.

For example, if I want to talk about myself, I might use the url lin-clark.com/user/lin. Using a prefix, I can shorten this to lc:lin. However, I have to tell the computer what "lc" stands for, so I add a namespace mapping to the html tag of the document. In this case, it would be xmlns:lc="lin-clark.com/user/".

Drupal core automatically creates a variable for the namespace mappings. You might notice that it was included in the above snippet, but I'll include it again here, simplified:

$rdf_namespaces in html.tpl.php

<html <?php print $rdf_namespaces; ?>>

RDFa for fields

RDF is an entity-attribute-value data model. That means that an RDF representation of something is basically a series of really simple sentences about that thing. If I was describing myself, statements would look like this:

Entity Attribute Value
1 lin name Lin Clark
2 lin interest data interoperability
3 lin interest HTML 5
4 lin website lin-clark.com

Fortunately, with the new Field API in core, Drupal's template structure matches this E-A-V model very well. There is a surrounding entity template (such as node.tpl.php or user-profile.tpl.php) with field templates nested inside of it. The field template receives the field values from the field formatter and places each value in its own wrapping div.

spacer

Because of this, you can get the proper markup for Field API fields simply by including a few attributes variables. You only need to include these if you are overriding templates in your theme, they are included by default in core templates.

Entity
$attributes should be placed in the entity wrapper in ENTITY.tpl.php (i.e. user-profile.tpl.php). This will give you the about attribute and the typeof attribute.
$attributes in user-profile.tpl.php

<div<?php print $attributes; ?>>

Attribute
$item_attributes[$delta] should be placed in the element that wraps the field value. This will place the property or rel attribute. This also places supporting attributes for the value, content and datatype, if needed.
$item_attributes in field.tlp.php

<?php foreach ($items as $delta => $item) : ?>
  <div <?php print $item_attributes[$delta]; ?>><?php print render($item); ?></div>
<?php endforeach; ?>

Value
The HTML rendered with render($item) above contains the value. This works for core fields, but your mileage may vary with contributed field formatters. If a field formatter puts extraneous HTML directly into the field value, this gets in the way of getting a clean value.

One way to get around this is to use the helper attributes content and resource. If you maintain a contrib field formatter, feel free to ping me on IRC to check the RDFa for your values, or check with people in #drupal-rdf.

RDFa for non-fields

RDFa for non-field values is much more complicated. Core does some RDFa output of non-field variables in particular instances. Examples of non-fields are the node author and date on nodes.

The one you need to know about is the title, which uses another attributes array, $title_attributes.

$title_attributes in node.tpl.php

<?php if ($title): ?>
  <h1<?php print $title_attributes; ?>>
    <a class="<?php print $node_url; ?>" rel="bookmark"><?php print $title; ?></a>
  </h1>
<?php endif; ?>

Until I'm persuaded otherwise, I think that we should discourage RDFa for non-fields in contrib. The advantage that Drupal has when it comes to RDF is the micro-templating (entity –> field –> field vaue) that is afforded by the PAC-ish architecture of Drupal. That structure matches the entity-attribute-value nature of RDF. Once you get more complicated than that, the code required gets... well, more complicated.

Entity API includes the concept of properties, which allows you to register non-field variables in your model. I haven't looked in to it yet, but if there is a standard wrapper for properties as there is for fields, then it should be able to use a similar format.

UPDATE: I have started looking into the Entity API properties in microdata. If it makes sense for developers using the API, I will recommend it for RDFa as well. There is an issue in the RDFx queue about this.

Validation

Validation is still tricky. Some validators can handle microdata attributes but not RDFa, some can't handle either. The problem isn't that HTML+RDFa is any less valid than any of the other HMTL 5 drafts, it's that the validators haven't been updated yet.

HTML+RDFa just recently went into last call at the W3C with HTML 5 and microdata, so I believe we will be seeing much more tool support in things like validators soon. When I hear of a well-known validator adding HTML+RDFa support, I'll be sure to broadcast.

The fact remains though... it is unfortunately quite complicated to figure out just how to validate an HTML5+RDFa document at this point. While I've heard a number of suggestions for workarounds, none has worked for me yet.

UPDATE: The W3C has requested that a task force take a look at how the microdata and RDFa specifications can be made more compatible. Once that task force has completed its work (which may be 2 months if everything goes smoothly, longer otherwise), the validation issue should be much more clear.

Feedback?

This will probably turn into a page or five in the handbooks, so please let me know... Anything I missed? Anything that could be clearer? Anything you would like to hear more about?


This work has been funded by
 the
 European
 Community's
 Seventh
 Framework
 Programme
 (FP7/2007­2013)
 under
 Grant
 Agreement 
n°
256975
, 
LOD
Around­The­Clock
(LATC) 
Support
Action.

  • drupal-planet
  • rdf
  • rdfa
  • theming
  • html-data

Comments

BigEd

Jun 27, 2011

Permalink

Interesting stuff

Great stuff your blog post seems well informed and a well written post.

I am about to embark on writing a way to import and export data between different sites using views and PHP/MySQL.

In your opinion would this be the way to move forward using RDF and HTML5? or do you think the current technologies have early development issues?

I am happy to have a bit of a head ache to get around issues like the validation now but in the long term have something that is far more flexible.

Thanks,

Lin

Jun 27, 2011

Permalink

Interesting, are you planning

 

Interesting, are you planning on using SPARQL Views?

If so, there are a couple of ways to do it.

  1. If the different sites are Drupal 7 sites and you administer all of them, you can use SPARQL Endpoint module on all of the sites that you are pulling data from. Then you can use SPARQL Views on the site that you are pulling data into, and just register the other sites endpoint URLs with SPARQL module. The endpoint URL will look something like example.com/sparql
  2. If the different sites all have RDFa, but don't necessarily have SPARQL endpoints enabled—or if you want to combine data from different sites in a single view—then you need to add web page urls to a single SPARQL store. This isn't yet fully supported... there are two ways to do it. I have a patch in the queue to support one way and another that I'll be posting within the next week to support the other way.

Feel free to post a support request in SPARQL Views queue if you intend to use the module to do this and I can help you figure out how to make it work.

 

BigEd

Jun 30, 2011

Permalink

I like the idea of this

Hi Lin,

Yes I had looked at SPARQL Views, my main issue is one site is on D6 and one on D7, while i administer all of them the diffrence in the API is likley to cause me issues. I cannot see the D6 site being upgraded to D7 just yet so I may have to go with transferring the data as it is.

I know there was some talk of enabling RDFa in D6 but how far this has gone i have not looked into yet, do you know much about this?

The idea of SPARQL store sounds promosing, I am going to do some testing on this before I decide which way is going to be the best long term option.

If I go with this I would appreciate the support requests thanks for the offer :)

1001webs

Jun 28, 2011

Permalink

schema.org

How about schema.org?

Is Drupal supporting that protocol?

 

Thanks

Lin

Jun 28, 2011

Permalink

Schema.org is not a protocol,

 

Schema.org is not a protocol, it is a vocabulary. Vocabularies can be used in either RDFa or microdata.

There is a Drupal module that provides the schema.org vocabulary terms to use with Drupal's native RDFa. However, schema.org terms are not yet being used by the search engines, so you won't see any visualized output in the testing tool yet.

 

acouch

Jul 03, 2011

Permalink

Google Rich Snippets Support

hi Lin:

Thanks so much for your fantastic work on this. It is really exciting what you've done so far.

I'm wondering what your thoughts and / or experience is so far with Google Rich snippets. They support RDFa but couldn't understand the attributes embedded in your test page: www.google.com/webmasters/tools/richsnippets?url=http%3A%2F%2Flin-clark.com%2Frdfa-test-page&view=cse even though they appear to be semantically correct. 

Do you have advice for people trying to get Google to try and read RDFa info?

Lin

Jul 04, 2011

Permalink

The attributes in the test

The attributes in the test page don't use the Rich Snippets vocabulary. The Rich Snippets vocabulary covers a small number of items, such as Reviews and Events, and should only be used to mark up those kinds of things.

Google's Rich Snippets does understand RDFa that uses its vocabulary. You can find out a little more here: https://www.ibm.com/developerworks/web/library/wa-rdf/

Adal

Oct 16, 2011

Permalink

Whats the Best Theme to choose as base for HTML5 and RDFa

Whats the Best Theme to choose as base for HTML5 and RDFa.

 

I choose a Adaptive and Picture Reloaded. Is any better than this for RDFa and HTML5. Please Advice. i digging the google for  month and cant find any right answer.

Lin

Oct 16, 2011

Permalink

Adaptivetheme works with RDFa

Adaptivetheme works with RDFa. I've never tried Picture Reloaded.

Techcrank

Oct 23, 2011

Permalink

thanks for the detailed

thanks for the detailed explanation ... I have been been working on the similar project for a client in Drupal...

Kris Olafson

Oct 25, 2011

Permalink

thank you

Thank you so much for this excellent post, it helped us get our RDFa into our field templates.

Recent blog posts

  • Microdata in Drupal early preview
  • Building modules on top of SPARQL Views
  • On being a woman in tech: some of my experiences
  • Microdata in Drupal: challenges for field formatters
  • More fun with CIA data: SPARQL Views with relationships and contextual filters
more
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.