Haycorn

About Me

Extremely positive attitude. My website is beebo.org.

spacer

Tags
  • javascript (2)
  • mongodb (2)
  • 10gen (1)
  • aop architecture (1)
  • architecture (1)
  • commit (1)
  • conference (1)
  • constructor (1)
  • durability (1)
  • functionalprogramming (1)
  • View all 17 tags »
Archive
2012 (3)
2011 (14)
2010 (5)
Filed under

mongodb

See all posts on posterous with this tag ยป
  • Edit
  • Delete
  • Tags
  • Autopost
 

On MongoDB durability and commit models

I’ve been reading and thinking a bit about MongoDB’s consistency model a bit over the last few weeks. I think what follows is an accurate model of how it behaves, and what the implications are, but I would be grateful for any corrections:

  • Whilst mongod is running, you may as well imagine that all your data is stored in memory, and that the files it creates on disk don’t actually exist. This is because the files themselves are memory mapped, and MongoDB makes no guarantees about whether they’re in a consistent state. (They’re theoretically consistent immediately after an fsync, and this can be forced in several ways, but I don’t think there’s any easy way to snapshot the database files immediately after the fsync.) [Update: Kristina pointed out that it is possible to fsync and then prevent the memory mapped file from being modified via the fsync and lock command. Once the lock (which blocks all write operations, though not reads) is in place, you can safely copy/snapshot the database file.]
  • Since the data is stored in memory (including virtual memory) you need replication to ensure durability. However, MongoDB argue you need replication for durability anyway (notwithstanding their plans to add single-server durability in 1.8). In a blog post (“What about Durability?”), they point out:
    • even if your database theoretically supports single-server durability (such as CouchDB with its crash only design), this will only protect you if you’ve turned off hardware buffering or have a battery-backed RAID controller to ensure your writes really hit disk
    • having a well-formed database file on disk doesn’t help you if the disk itself fails, and this mode of failure is about as likely as any other
    • for some applications, you need 100% uptime, and so the delay required to recover from a failure (by reloading the database and replaying the transaction log, for example) is unacceptable
  • By default writes are not persisted immediately to the database—not even the in-memory data structure. Writes instead go into a per-server queue (with FIFO ordering?), and there’s no guarantee about when the write will be visible to other clients. (Although it does seem that clients are guaranteed to read their own writes.)
  • To change the default write behaviour, you: (a) issue the write as normal and then (b) issue the confusingly-named getLastError() command. getLastError() blocks until the write is either committed to the in-memory data structure, fsynced to disk, or committed to n replicas, depending on the arguments passed. (Note that many clients abstract away the write plus getLastError() call, so that the arguments to getLastError() become arguments to the write.
    • To block until the write is committed, issue getLastError() with no options.
    • To block until the data structure is fsynced to disk, issue getLastError() with fsync = true.
    • To block until the write has been added to the write queue of n replicas, issue w = n. (Note that there is currently no way to block until the write has been fsynced on the replicas.)

Filed under  //  commit   durability   mongodb  
Posted
  • Edit
  • Delete
  • Tags
  • Autopost
 

MongoDB's MongoUK conference

MongoUK—a small MongoDB conference in London—happened last Friday, and it was wonderful. I haven’t been to that many conferences, but it probably the best tech conference I’ve been to, at least in terms of my enjoyment of the talks themselves. (DPC is over three days, in Amsterdam…)

The conference was run by 10gen (the company behind MongoDB) themselves, though it felt about as much like an independent conference as it was possible to be under the circumstances. (I’m pretty the conference was subsidised—10gen are just getting started, and so at this point they want to get as many people as possible happily using MongoDB as possible, and they have the VC money to do it.)

A few points that struck me about MongoDB:

  • Unlike most NoSQL databases, MongoDB supports SQL-like indexes, though it will only use one index per query. Curiously, if multiple indices could potentially be used to resolve a particular query, the server picks the index to be used via a heuristic that includes some information about how useful the index on previous queries.
  • Sharding is complicated. Maybe even very complicated. Since production deployments are recommended to have two slaves per master, a deployment with sharding involves at least 6 servers for the data itself, plus several configuration servers, plus several servers to act as front-ends to the shards (to conceal the existence of the shards from clients). These servers more or less need to be manually configured—the cluster doesn’t organise itself. Perhaps some of these can be virtual machines (?), but a production setup with sharding involves many physical machines.
  • With map/reduce over shards, the maps are done separately within each shard, but the reduce is done on one server. This also means that the resulting collection is itself not sharded.
  • Most contributions to the MongoDB server itself come from 10gen itself—they haven’t had many outside contributions.
  • No-one asked about the AGPL license!

Overall, my feelings toward MongoDB didn’t change that much. It’s a new product, to be used with a modicum of caution, and it shouldn’t be the automatic first choice. (Though perhaps there is no automatic first choice when it comes to NoSQL document-oriented databases.) It is used in production (many of the attendees I spoke to were using it in production), though I think many of the very large and complicated deployments are done with a small amount of hand-holding from 10gen. I was reassured by the quality of the MongoDB people, and their commitment to their product and its users. 10gen and MongoDB are extremely developer-friendly in all respects—the product itself, the documentation, their people, their “marketing” (there was no hard sell), etc.

Filed under  //  10gen   conference   mongodb  
Posted
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.