Rogue: A Type-Safe Scala DSL for querying MongoDB

Jan 21st

Rogue is our newly open-sourced, badass (read: type-safe) library for querying MongoDB from Scala. It is implemented as an internal DSL and leverages the type information in your Lift records and fields to keep your queries in line. Here’s a quick look:

Checkin where (_.venueid eqs id)
  and (_.userid eqs mayor.id)
  and (_.cheat eqs false)
  and (_._id after sixtyDaysAgo)
  select(_._id) fetch()

VenueRole where (_.venueid eqs this.id)
  and (_.userid eqs userid)
  modify (_.role_type setTo RoleType.manager)
  upsertOne()

We’ve been developing and using it internally at foursquare for the last several months. You can now get the sources on github, and the packaged JAR is available on scala-tools.org under foursquare.com/rogue (current version is 1.0.2).

In this post, we’re going to dive in to some of the motivations and implementation details of Rogue, and hopefully show you why we think Scala (and MongoDB and Lift) are so awesome.

Background

At foursquare we use the Lift web framework for our ORM layer. Lift’s Record class represents a database record, and MetaRecord trait provides “static” methods for querying and updating records in a fully expressive way.

Unfortunately, we found the querying support a bit too expressive — you can pass in a query object that doesn’t represent a valid query, or query against fields that aren’t part of the record. And in addition it isn’t very type-safe. You can ask for, say, all Venue records where mayor = “Bob”, and it happily executes that query for you, returning nothing, never informing you that the mayor field is not a String but a Long representing the ID of the user. Well, we thought we could use the Scala type system to prevent this from ever happening, and that’s what we set out to do.

For reference, here’s a simplified version of our Venue model class:

class Venue extends MongoRecord[Venue] {
  object id extends Field[Venue, ObjectId](this)
  object venuename extends Field[Venue, String](this)
  object categories extends Field[Venue, List[String]](this)
  object mayor extends Field[Venue, Long](this)
  object popularity extends Field[Venue, Long](this)
  object closed extends Field[Venue, Boolean](this)
}

object Venue extends Venue with MongoMetaRecord[Venue] {
  // some configuration pointing to the mongo
  // instance and collection to use
}

Lift’s MongoMetaRecord trait provides a findAll() method that lets you pass in a query as a JSON object (MongoDB queries are in fact JSON objects), returning a list of records. For example, using lift’s JsonDSL, we can do:

Venue.findAll((Venue.mayor.name -> 1234) ~
              (Venue.popularity.name -> ("$gt" -> 5)))

which is equivalent to

Venue.findAll("{ mayor : 1234, popularity : { $gt : 5 } }")

which will return a List[Venue] containing all venues where the mayor is user 1234 and the popularity is greater than 5. And this all works fine until the day you do

Venue.findAll(Venue.mayor.name -> "Bob")
Venue.findAll(Venue.categories.name -> ("$gt" -> "Steve"))

which don’t really make sense and should be able to be detected by the compiler.

Scala to the rescue!

We would like to write an internal Scala DSL that lets you write queries like this:

Venue where (_.mayor eqs 1234)
Venue where (_.mayor eqs 1234) and (_.popularity eqs 5)

while enforcing some kind of type safety among records, fields, conditions and operands. To start off, we need to pimp the MongoMetaRecord class to support the where and and methods.

implicit def metaRecordToQueryBuilder[M <: MongoRecord[M]]
    (rec: MongoMetaRecord[M]) =
  new QueryBuilder(rec, Nil)

class QueryBuilder[M <: MongoRecord[M]](
    rec: M with MongoMetaRecord[M],
    clauses: List[QueryClause[_]]) {
  def where[F](clause: M => QueryClause[F]): QueryBuilder[M] =
    new QueryBuilder(rec, clause(rec) :: clauses)
  def and[F](clause: M => QueryClause[F]): QueryBuilder[M] =
    new QueryBuilder(rec, clause(rec) :: clauses)
}

Notice that the where method applies the clause function argument to the MetaRecord (rec) in use. So in a query like Venue where (_.mayor ...), the where method applies _.mayor to Venue, yielding Venue.mayor. So what about the eqs 1234 part? We have something like Venue.mayor, which is a Field, and we need to return a QueryClause[F] (F represents the field type, Boolean or String or whatever). So all we need to do is pimp the Field class and add the method eqs, which will take an operand (e.g., 1234) and return a QueryClause[F].

implicit def fieldToQueryField[M <: MongoRecord[M], F]
    (field: Field[M, F]) =
  new QueryField[M, F](field)

class QueryField[M <: MongoRecord[M], F]
    (field: Field[M, F]) {
  def eqs(v: F) = new QueryClause(field.name, Op.Eq, v)
}

(Op is just an enumeration that defines all the comparison operators that MongoDB supports: Eq, Gt, Lt, In, NotIn, Size, etc. The .name method is provided by Lift’s Field class, which through the magic of reflection is a String that matches the name of the field object as it is declared in the Record.)

So an expression like Venue where (_.mayor eqs 1234) gets expanded by the compiler to:

metaRecordToQueryBuilder(Venue).where(rec =>
  fieldToQueryField(rec.mayor).eqs(1234))

This allows the compiler to enforce two things: that the field specified (mayor) is a valid field on the record (Venue), and that the value specified (1234) is of the same type as the field (Long) — notice that the eqs method takes an argument of type F, the same type as the Field.

More operators

We can extend this to support other conditions besides equality. The Scala type system helps us once again in ensuring that the condition used is appropriate for the field type.

implicit def fieldToQueryField[M <: MongoRecord[M], F](field: Field[M, F]) =
  new QueryField[M, F](field)
implicit def longFieldToQueryField[M <: MongoRecord[M]](field: Field[M, Long]) =
  new NumericQueryField[M, F](field)
implicit def listFieldToQueryField[M <: MongoRecord[M], F](field: Field[M, List[F]]) =
  new ListQueryField[M, F](field)
implicit def stringFieldToQueryField[M <: MongoRecord[M]](field: Field[M, String]) =
  new StringQueryField[M](field)

class QueryField[M <: MongoRecord[M], F](val field: Field[M, F]) {
  def eqs(v: F) = new QueryClause(field.name, Op.Eq, v)
  def neqs(v: F) = new QueryClause(field.name, Op.Neq, v)
  def in(vs: List[F]) = new QueryClause(field.name, Op.in, vs)
  def nin(vs: List[F]) = new QueryClause(field.name, Op.nin, vs)
}

class NumericQueryField[M <: MongoRecord[M], F](val field: Field[M, F]) {
  def lt(v: F) = new QueryClause(field.name, Op.Lt, v)
  def gt(v: F) = new QueryClause(field.name, Op.Gt, v)
}

class ListQueryField[M <: MongoRecord[M], F](val field: Field[M, List[F]]) {
  def contains(v: F) = new QueryClause(field.name, Op.Eq, v)
  def all(vs: List[F]) = new QueryClause(field.name, Op.All, v)
  def size(s: Int) = new QueryClause(field.name, Op.Size, v)
}

class StringQueryField[M <: MongoRecord[M], F](val field: Field[M, String]) {
  def startsWith(s: String) = new QueryClause(field.name, Op.Eq, Pattern.compile("^" + s))
}

You can see that only certain field types support certain operators. No startsWith on a Field[Long], no contains on a Field[String], etc. So now we can build queries like

Venue where (_.venuename startsWith "Starbucks")
      and   (_.mayor in List(1234, 5678))

without having to worry about the stray (and admittedly contrived)

Venue where (_.mayor startsWith "Steve")
      and   (_.venuename contains List(1234))

Executing queries

Now once we have a QueryBuilder object, it is a straightforward exercise to translate it into a JSON object and send it to lift to execute.  This is done by the fetch() method:

Venue where (_.mayor eqs 1234)
      and   (_.categories contains "Thai") fetch()

It’s also a simple matter to support .skip(n), .limit(n) and .fetch(n) methods on QueryBulder.

Summary

To recap, Rogue enforces the following, at compile time:

  1. the field specified in a query clause is a valid field on the record in question
  2. the comparison operator specified in the query clause makes sense for the field type
  3. the value specified in the query clause is the same type as the field type (or is appropriate for the operator)

In the next post, we’ll show you how we added sort ordering to the DSL and how we used the phantom type pattern in Scala to prevent, again at compile time, constructions like this:

Venue where (_.mayor eqs 1234) skip(3) skip(5) fetch()
Venue where (_.mayor eqs 1234) limit(10) fetch(100)

In the meantime, go check out the code — contributions and feedback welcome!

- Jason Liszka and Jorge Ortiz, foursquare engineers

Posted in Foursquare Engineering Blog

26 Comments

  • Frantzdyromain

    Hmmm very interesting. I use c++ and Python but scala looks mighty fine and gorgeous about rite now spacer

  • j2labs.net Anonymous

    You might like using either DictShield or MongoEngine for two approaches to managing the flexible nature of MongoDB with Python

    DictShield: https://github.com/j2labs/dictshield

    DictShield focuses on validation of JSON / Dictionary inputs and manipulating the results of queries before giving them to the public. It makes no attempt to do more than prepare a document for mongo and leaves the querying up to a user to tune those for their use case. A query that goes out of alignment with indexes for any reason can become painful.

    MongoEngine: https://github.com/hmarr/mongoengine

    MongoEngine attempts to Object-Document Mapper for Mongo and goes the full way from handling indexes to prepare structures that map to Mongo map-reduce jobs.

  • twitter.com/timowest Timo Westkämper

    Nice tool. If you need something comparable for Java, then checkout Querydsl for Mongodb. Here is a related blog post : blog.mysema.com/2010/11/mongodb-with-querydsl.html

  • twitter.com/timowest Timo Westkämper

    Nice tool. If you need something comparable for Java, then checkout Querydsl for Mongodb. Here is a related blog post : blog.mysema.com/2010/11/mongodb-with-querydsl.html

  • RichD

    Hi there, thanks for sharing this. I spent some time trying to get it up and running but was unable to. Can you post a quick start guide? Admittedly, I am newer to Scala/Lift/MongoDB having been converted from a PHP/MySQL background after seeing the video of Harry’s MongoDB talk a few weeks ago. Thanks again.

  • twitter.com/jliszka Jason Liszka

    Sure. Do you have Lift records set up already, or are you starting from scratch?

  • RichD

    Thanks for the reply! Disqus didn’t email me like it was supposed to that you replied so I didn’t see it until now.

    Basically, I’m familiar by now with setting up my project file for SBT and have Lift and everything else installed. I have a minimal website working and can connect to the database using lift-mongodb-record.

    So what I’m looking for in this case is if there’s an SBT project include for Rogue. I also tried to put the JAR right into the directory under lib-managed but then I didn’t know what to import in my file.

    I was going to just include your source files as directories under my src directory, but then I didn’t know how to merge in your project dependencies.

    I hope this clarifies a bit!

    Thanks again.

  • twitter.com/jliszka Jason Liszka

    Oh great! Just add the following lines to your sbt project file (project/build/WhateverProject.scala):

    val rogue = “com.foursquare” %% “rogue” % “1.0.2″ withSources()
    val jodaTime = “joda-time” % “joda-time” % “1.6″ withSources()

  • Anonymous

    Thank you for this great contribution.

    Putting it to work, I did hit a case you might want to improve if that’s possible :

    2 Mongo classes : User and InvitationRequest
    u is a User
    val invitRQuery = InvitationRequest where (_.userId eqs u._id)

    compiler flashes ! Changing u._id to u._id.get satisfies it but that’s less elegant than the left part.

  • Rog

    Another newbie here. I’ve add the two lines mentioned in an earlier posting to the sbt project file but am still unclear on how to tie it all in with the code. (I’ve used mapper, but new to Record and Mongodb and Rogue). Do you have a sample that builds a basic Lift app (with ProtoUser support)? In the Venue sample, you “extend MongoRecord” for example. But with Lift’s built-in ProtoUser that is already defined, I’m unclear on what gets “extended” or what needs to be extended “with” something else. I’ve tried some variations on extending MegaProtoUser with MongoRecord for example, but if I try “val query = User where (_.firstName eqs “Joe”)” the compiler says that “where is not a member of object code.model.User.

    Many thanks for your patience with newbies!

  • Rog

    Well, if I might take a stab at answering my own question … just recall Occam’s Razor. It looks like I don’t need to do anything extra with the model itself. One does need to include “import com.foursquare.rogue.Rogue._” in the snippet code though.

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.