spacer

On this page

  • Xapian and Lucene - Kudos to Xapian from YouSport.com
  • Encoding Hashed UIDs: Base64 vs. Hex vs. Base32
spacer

Xapian and Lucene - Kudos to Xapian from YouSport.com

Posted in xapian Mon, 26 Jan 2009 05:53:00 GMT

A few years ago I had picked Xapian after evaluating a number of solutions. More recently, the popularity surge of Lucene had me curious to learn about it. I needed to do a rip and replace of MySQL fulltext search due to scaling issues so I decided to check out clucene. I quickly found out the API was not as up to date as Lucene (a fast moving target) and that the mailing list had only had 4 posts in the last year or so. That led to a conclusion to move away from clucene. After that, I was told to check out Solr as an easy way to use Lucene without needing to implement Java. I replaced MySQL with Xapian but still had Solr in the back of my mind to check out.

Recently, an email from Jonathan Drake, Senior Developer at YouSport.com, came across the xapian-discuss mailing list that said:

We were using Solr before but it was constantly causing headaches in terms of scalability and complexity. I gave Xapian a go and so far I'm blown away by how awesome it is. Its incredibly lightweight, its scaled a 100 times better and everyone involved is happier.

I'm curious to hear what scaling and complexity problems they faced, but it's good to hear a strong endorsement of Xapian from a former Solr developer. That, and a quick check of the current users page listing del.icio.us with over 100 million documents, seems to indicate that Xapian remains a strong contender in the search space. That being said, I work with very scalable Lucene-based solutions as well, just in Java projects.

spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer

no comments

spacer spacer spacer

Encoding Hashed UIDs: Base64 vs. Hex vs. Base32

Posted in perl, mysql, xapian Mon, 02 Oct 2006 08:08:00 GMT

I recently looked at using various encodings for hashed UIDs, e.g. UIDs generated by a crytographic hash algorithm such as SHA-1 or MD5. These are often useful when the UID does not need to have human meaning but should exhibit some uniformity, such as character set and length. I considered Base64 and hexadecimal first because they are commonly used by crypto libraries and then decided on Base64 and Base32 where appropriate. Base36 is actually the most compact case insensitive encoding (using Arabic numbers and Roman letters) but is not an option for me at the moment because there's no Perl module for it that will take arbitrary text and binary input at the moment. Math::Base36 exists but only handles numbers.

Read more...
spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer spacer

no comments


gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.