Ruby/Odeum

Ruby/Odeum is a binding to the fantastic QDBM Odeum inverted index library. Odeum is used in the Estraier search engine and is written by the same author. It lets you easily construct a very fast inverted index so you can search for documents by words really quickly. It is released under the same license as QDBM (LGPL). The source includes the minimum source from QDBM needed to use Odeum, so it will work right out of the box.

Updates

Jun 16: Ruby/Odeum Release (0.4.1)

This release fixes a missing file from the last release, cleans up the ResultSet API making it more generally useful, and implements a simple “index server” using DRb with an example. There will be a nother official release soon which will focus on documentation improvements.

You can download 0.4.1 source, download a gem or visit the Ruby/Odeum project

Jun 07: Release of Ruby/Odeum (0.4)

Just released a new version of Ruby/Odeum. This version now uses a ResultSet object which you can access using several methods (direct, iterator, to_a, etc.) and Marshal for storage purposes. This makes paged search results a piece of cake (use the ResultSet.next_n_doc function). A big advantage of the ResultSet design is that it improves performance by about 15-25% for accessing documents over the previous design, but it doesn’t complicate things. You will have to alter your code, but if you just want an array like previously then you just use ResultSet.to_a to get the array (warning, that’s slower).

Finally, there’s an example of wrapping a database using the KirbyBase as the database backend. You can store documents and it will index them on the fly (even handling deletes and updates). Then there’s an additional KirbyBase.search function that is added on which lets you do an odeum query and get back the matching records. The nice thing is the implementation is done my extending the base KBEngine so you can just require ‘odeum/kirbybase’ and you get the goods.

May 30: Ruby/Odeum vs. Lucene Part 2

After writing Programmers Need To Learn Statistics Or I Will Kill Them All I figured I’d put my money where my mouth is and do the whole thing over again following my own advice. The results actually show some very cool results, reinforce everything I said about statistical cautions, and also hopefully give people some hints on conducting a performance analysis. The gist of the results is that Ruby/Odeum is 1.75 times slower than Lucene on average, but that Lucene uses a lot more memory.

Read on for the complete analysis.

Quick Start Guide

Read the RubyDoc documentation for the library in order to learn how to use the library. I made sure that this release is completely documented.

Using Ruby/Odeum is very simple and involves only 3 classes you need to deal with. The Odeum::Index and Odeum:ResultSet class is responsible for managing the database. The Odeum::Document class is used to contain information about each indexed document. Check out the bin/odeum_mgr file for an example of usage and test/test_odeum.rb for more extensive usage.

You can also review the bin/odeum_mgr for a simple example of using the major feature to index, search, and query some documents.

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.