Pages

  • Home
  • Cookies and Privacy Policy / Privacy Notice
  • About Me
spacer

Sunday, 26 February 2006

Technorati tag pages problems, revisited




Deutsch | Español | Français | Italiano | Português | 日本語 | 한국어 | 简体中文


Tweet

It's well known that the blogosphere search engine Technorati sometimes doesn't show tagged posts on their tag pages even when they've been correctly tagged - I've blogged about these Technorati problems before (and that post has been mentioned e.g. on Om Malik's blog).

I still encounter that problem myself. For instance my recent post on Technorati favorites isn't on Technorati's Improbulus tag page, whereas it's on Icerocket's Improbulus tag page. So it's clearly something to do with how Technorati are picking up (or rather not picking up) my posts or tags from my posts, or how they're not displaying tagged posts properly.

Consistently with my previous experience, it's not that Technorati aren't indexing the posts at all - e.g. my Technorati favorites page clearly includes that post, so the post has in fact been indexed by Technorati. So it has to be that they're not picking up the tags from that post, or that their tags database doesn't return the correct info for certain tags (or tags from certain types of posts). I've gone into this in more detail before.

Now Niall Kennedy (then, though not now, of Technorati) commented that one thing Technorati consider important is how valid the behind the scenes HTML or XHTML of your posts - the less "valid", the less likely that Technorati will index them properly. But I'm sure that validation isn't it, in my case at least - I know my template throws up some warnings or errors as far as validation goes (though I've tried to make it as valid as I can within reaons), but that applies to all my posts; yet some are on Technorati's tag pages, and some aren't.

I've pretty much given up asking Technorati for help on this point as, despite their recently recruiting a new full time customer support specialist Janice Myint, my latest emails on this subject still go unanswered.

Now I'd rather not further hassle their no doubt extremely busy CEO David Sifry, who's been kind enough to help sort out a problem in the past when none of my posts were getting indexed on Technorati at all (they then tweaked something at their end and it was fine), though I do think this issue is something Technorati need to sort out if they want to maintain user confidence and trust in their service in the longer-term.

But I suspect there are a few limitations to Technorati's system which for whatever reason their competitor Icerocket doesn't share, though Icerocket certainly has issues of their own ("system" is vague, I know, but I don't know whcther it's their spider or their database or indeed tag searcher that's choking, so I'll just say "system").

Leaving aside the "validity" question (which I believe may be a red herring in my case), my guess is Technorati's system particularly doesn't like:
  • long posts (this post never got picked up properly for example, whereas Google's spider loves lots of text)
  • posts with lots of code examples (so, this post isn't on their tag pages - and it's long)
  • posts with forms or lots of other HTML that's not just text and links/pics (e.g. my original post on the problems!)
Now, it could be that the way Blogger handles posts with lots of more complex HTML, it translates the code into non-valid XHTML on publishing, and that could be why Technorati doesn't like them (though I think their spider is way too sensitive in that case).

But, I'm going to try an experiment. I'm going to split my previous post, which happens to be long, has lots of code examples and a form, into different individual posts, and republish them - then see which posts' tags get picked up, and which don't.

If you too have some posts not displaying on Technorati's tag pages, I'd be interested to know which posts, and are there any common factors, and do they fit any of the suspected criteria I've listed above?

And if your long post or post with forms etc doesn't show up properly on Technorati's tag pages, why not try doing a short post with no forms or any code other than links, but with exactly the same Technorati tags, which links to your "missing" post? That way the new post (assuming it doesn't get missed too) could at least be a way to lead people to the original post.

I'm about to follow my own suggestion too, as well as doing the split posts I mentioned. I'll of course report on the results of my experiment.

Update 13 March 2006: My test results are here, interesting but puzzling, and after I posted them Dave Sifry the Technorati CEO emailed me to say they're on it - see this post; if people regularly report this problem to Technorati when they encounter it, it might help them fix it faster.


Technorati Tags: search, searching, Technorati, Technoratitags, Technorati tags, tag, tags, tagging, tag pages, blog, blogs, blogging, validation, problems, Improbulus, A Consuming Experience, Consuming Experience

9 comments:

spacer
David said...

Improblus,

As usual a great post. I'm sending this to all the folks a Technorati for required reading. We'll get to the bottom of your tag issues, and hopefully get everyone else sorted out as well...

Dave

spacer
mark said...

what kind of problems are you experiencing with IceRocket ?

m

spacer
Improbulus said...

Thanks very much David, it's great to know that you're looking at the tag issues. I'll report back on the results of my experiments when done but it's rather puzzling so far.

Mark - Icerocket didn't index my blog at all for a period of about a month (i.e. the whole of December 2005 pretty much) despite pings etc. Not just tags, but all my posts. Nada.

spacer
Dave Lucas said...

I noticed yesterday my posts stopped appearing on Technorati. I don't recall my blogs numbers, but it was like 312 something and 121 links... all GONE... checked my Technorati account profile and found my mai

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.