Lin Clark

Microdata in Drupal: challenges for field formatters

Submitted by Lin on

Interest in microdata has been on the rise since the schema.org announcement in June.

I had fortunately already been looking at the microdata spec and thinking about how the work to get RDFa output in core could be repurposed for microdata, so I started a project that day.

Since microdata is based on RDFa, there is a lot that can be repurposed. But as I noted in my last post on the subject, there are also small differences between the specs... and in some cases, these small differences have a big impact.

We need to start thinking about those impacts.

One of the differences between RDFa and microdata is that microdata is much more sensitive to the placement of attributes within the HTML.

For example, let's say you have a profile page about yourself and you want to add an image of yourself to the profile. In RDFa, you use the rel attribute to create the relationship between you and the picture. There are a number of places you could add this rel attribute, so long as it is in between the id for you and the <img> tag.

Most basic RDFa markup

    <div about="lin" rel="image">
      <img src="/img/spacer.gif"> 

Because the rel attribute knows it is looking for a URL attribute, like href or src, you don't have to worry about it misunderstanding what you meant if you add a little text (or even particular extraneous HTML elements) around the picture. And you can even have multiple values included for the same rel attribute.

More complex RDFa markup

    <div about="lin">
      <div rel="image">
        
        <p>And here are some pictures of me.</p>
        
        <img src="/img/spacer.gif"> 

In contrast, there is only one place where you can put microdata's itemprop attribute—directly on the value's HTML element itself.

More complex example as expressed in microdata

    <div itemid="lin" itemscope>
      <p>And here are some pictures of me.</p>
      <img itemprop="image" src="/img/spacer.gif"> 

What this means for Drupal

As I explained in my post about theming with HTML5 and RDFa, Field module handles most of the RDFa output on a normal site. It inserts RDFa's rel attribute (or property or rev) on the <div> that wraps around the field formatter output. Theoretically, it would work like the following:

Interaction between Image module and Field module

  1. Image module's field formatter returns the img element
    <img src="/img/spacer.gif"> 
  2. Field module wraps the returned element in a div
    <div rel="og:image"><img src="/img/spacer.gif"> 

However, this simply doesn't work for microdata. The itemprop can't be added to the wrapping <div>, it has to be placed within the field formatter's output itself.

With RDFa, Image module doesn't need to be aware of whether or not there is extra markup... the formatter just passes the element up to Field module which worries about the RDFa. However, with microdata, Image module needs to place the itemprop attribute itself.

What was once centralized in core's Field module now has to be coordinated across contrib's field formatter modules.

A little tech difference turns into a big social difference

Adding RDFa support in Drupal required much less explicit participation from module developers... it's just on by default. Because all fields pass through the code in field.module, contrib module developers didn't have to do anything in order to enable RDFa markup for their fields. In contrast, microdata will take explicit cooperation from module developers who are creating field formatters.

In some ways, the lower amount of coordination between developers that RDFa affords is a good thing. Drupal is a very large, very loosely coordinated system; as of early August, just shy of 200 Drupal 7 modules defined field formatters (out of nearly 2,000), and there will surely be many more coming. If current trends hold, it would be easy to see the number of modules defining field formatters reaching 800-1,000 before Drupal 7 module development winds down.

Explicit participation from that many developers is tough. It means there are a lot of people (hundreds of developers) who you have to teach the basics of metadata placement to.

... but more work isn't necessarily bad

On the other hand, I think that the explicit participation that microdata requires from field formatter module developers could turn out to be a good thing.

Currently, field formatter developers don't understand how their fields get marked up with RDFa... because we don't ask them to understand. While this works for basic fields, there are a couple of ways in which it can go wrong.

Uninformed field formatter developers
Most field formatter developers don't review their RDFa output and wouldn't know what to check for if they did. Some formatters do things like adding linked headings within the field value itself instead of working with labels, which can really mess up rel values.
Compound fields

Currently full RDFa output for all field data requires that field formatters be reduced to their most granular level of data. Some modules allow for this. For example, Field Collection encourages you to decompose the information in your fields into the most basic units. You can create a wide variety of complex fields using only Field Collection and the fields that core provides.

In contrast, something like AddressField doesn't rely on the entity-field relationships to model the inner data. It's an example of what I call a compound field. These fields manage their own schema and create a complex blob of HTML in hook_field_formatter_view. They may use Entity API's property information in order to expose the data model to other modules, but they don't follow the core entity-field relationship that RDFa output in Drupal relies on.

Themers overriding field templates
One of the great things about Drupal is the granularity of the theme layer. For advanced themers, it is easy to override just a small part of the HTML output without having to repeat the template code for the parent or child elements.

One of the most common things to override is field output. But since theme_field places the RDFa in the wrapping element that the themer is trying to change, it is easy for themers to blow away the RDFa or add markup that changes the meaning of the RDFa in the process.

I think the changes that microdata necessitates will help us take care of these problematic issues.

  1. Microdata will only be output by fields that have intentionally enabled output. This means the maintainer or a contributor will have some knowledge of what the output is supposed to look like... who knows, they might even write tests for it!
  2. Because field formatter developers will need to intentionally place the itemprop variables within their hook_field_formatter_view implementations anyway, it wouldn't be too much extra work to enable mappings for compound fields.
  3. Themers are less likely to be altering the things output by hook_field_formatter_view, so will be less likely to interfere with the metadata output.

The challenge

The challenge now is creating an easy to use API that passes terms for mappings from the microdata module to field formatters, enabling (in as foolproof a way as possible) the field formatter to place the terms. I have a start on one, but would really like some feedback from the people who will actually have to use it.

Who's up for the challenge?

If you've made it this far, I'd say there's a high probability you are ;)

I would like to have a BoF at DrupalCon London, I hope others want to join. I'm looking forward to involvement and input from:

  • Field formatter developers
  • Advanced themers
  • People who really like reading specs
  • Those passionate about DX (developer experience)
  • People with other relevant knowledge who I inadvertently forgot

This work has been funded by
 the
 European
 Community's
 Seventh
 Framework
 Programme
 (FP7/2007­2013)
 under
 Grant
 Agreement 
n°
256975
, 
LOD
Around­The­Clock
(LATC) 
Support
Action.

  • drupal-planet
  • microdata
  • html-data

Comments

Ryan Price

Aug 30, 2011

Permalink

We should have you on the

We should have you on the DrupalEasy Podcast sometime to cover some of this stuff. For myself, I won't really understand it until I have a project where I can apply it.

I am toally down with Microformats, but there are still some situations where I can't get Google to use them, and that's really one of the big motivations for the clients right now.

Joey

Sep 07, 2011

Permalink

validation tools

Good article!

Are there any validation tools for the schema.org microdata?  schema.org referenced some from google, but the rich snippets tools seem to choke on any page you send them.  

Thanks!

Lin

Sep 07, 2011

Permalink

The Rich Snippets tool should

The Rich Snippets tool should work for any page that contains schema.org in microdata. It will extract the item and show you the data. It won't change the way the search result looks yet, though.

I will be recording a screencast about this soon and will show how you can use the Rich Snippets tool to test schema.org microdata and how you can use foolip.org/microdatajs/live/ to test all microdata.

Recent blog posts

  • Microdata in Drupal early preview
  • Building modules on top of SPARQL Views
  • On being a woman in tech: some of my experiences
  • Microdata in Drupal: challenges for field formatters
  • More fun with CIA data: SPARQL Views with relationships and contextual filters
more
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.