RavenDB Open source 2nd generation document DB

As a Document DB it remains true to the core principles of these type of storage mechanisms. Somehow it manages to combine the best of relational databases with that of document databases.

Hadi Hariri, JetBrains
RavenDB is a great piece of technology and had a very positive impact on our workflow. The dedicated and very professional support of the core team really makes a difference.

Federico Lois, Founder, Corvalius
All of our developers agree - RavenDB is one of those technologies that bring back fun into your coding even after long years of writing.

Herve Bansay, Founder, Beatman Ltd

Intro
Theory
.NET Client API
- Connecting to a RavenDB datastore
- Basic operations
- Querying
  - Using Linq to query RavenDB indexes
  - Paging
  - Stale indexes
  - Static indexes
  - Handling document relationships
- RavenDB in F#
- Working Asynchronously
- Partial document updates using the Patching API
- Set based operations
- Attachments
- Sharding
- Faceted search
- Advanced topics
HTTP API
Server side
The Studio
Appendixes
FAQ

Handling document relationships

One of the design principals that RavenDB adheres to is the idea that documents are independent, meaning all data required to process a document is stored within the document itself. However, this doesn't mean there should not be relations between objects.

There are valid scenarios where we need to define relationships between objects. By doing so, we expose ourself to one major problem: whenever we load the containing entity, we are going to need to load data from the referenced entities too, unless we are not interested in them. While the alternative of storing the whole entity in every object graph it is referenced in seems cheaper at first, this proves to be quite costly in terms of database work and network traffic.

RavenDB offers three elegant approaches to solve this problem. Each scenario will need to use one or more of them, and when applied correctly, they can drastically improve performance, reduce network bandwidth and speedup development.

The theory behind this topic and other related subjects are discussed in length in the Theory section.

Denormalization

The easiest solution is to denormalize the data in the containing entity, forcing it to contain the actual value of the referenced entity in addition (or instead) of the foreign key.

Take this JSON document for example:

    
      { // Order document with id: orders/1234
        "Customer": {
          "Name": "Itamar",
          "Id": "customers/2345"
        },
        Items: [
          { 
            "Product": { 
              "Id": "products/1234",
              "Name": "Milk",
              "Cost": 2.3
              },
            "Quantity": 3
          }
        ]
      }

As you can see, the Order document now contains denormalized data from both the Customer and the Product documents, which are saved elsewhere in full. Note how We haven't copied all the properties, and just saved the ones that we care about for this Order. This approach is called _denormalized reference_. The properties that we copy are the ones that we will use to display or process the root entity.

Includes

Denormalizing data like shown above indeed avoids many lookups and results in transmitting only the necessary data over the network, but in many scenarios it will not prove very useful. For example, consider the following entity structure:

    public class Order
    {
    	public Product[] Items { get; set; }
    	public string CustomerId { get; set; }
    	public double TotalPrice { get; set; }
    }
    
    public class Product
    {
    	public string Id { get; set; }
    	public string Name { get; set; }
    	public string[] Images { get; set; }
    	public double Price { get; set; }
    }
    
    public class Customer
    {
    	public string Name { get; set; }
    	public string Address { get; set; }
    	public short Age { get; set; }
    	public string HashedPassword { get; set; }
    }

We know whenever we load an order from the database we need to know the user name and address. So we decided to denormalize the Order.Customer field, and store those details in the order object. Obviously, the password and other irrelevant details will not be denormalized:

public class DenormalizedCustomer
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
}

As you can see there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedCustomer, which holds the interesting bits from Customer that we need to process requests on Order.

Now, what happens when the user's address is changed? we will have to perform an aggregate operation to update all orders this customer has made. And what if this is a constantly returning customer? This operation can become very demanding.

Using the RavenDB Includes feature you can do this much more efficiently, by instructing RavenDB to load the associated document on the first request. We can do so using:

    var order = session.Include<Order>(x => x.CustomerId)
    	.Load("orders/1234");
    
    // this will not require querying the server!
    var cust = session.Load<Customer>(order.CustomerId);

You can even use Includes with queries:

    var orders = session.Query<Order>()
    	.Customize(x => x.Include<Order>(o => o.CustomerId))
    	.Where(x => x.TotalPrice > 100)
    	.ToList();
    
    foreach (var order in orders)
    {
    	// this will not require querying the server!
    	var cust = session.Load<Customer>(order.CustomerId);
    }

What actually happens under the hood is that RavenDB actually has two channels in which it can return information for a load request. The first is the Results channel, which is what is returned from the Load method call. The second is the Includes channel, which contains all the included documents. Those documents are not returned from the Load method call, but they are added to the session unit of work, and subsequent requests to load them can be served directly from the session cache, without any additional queries to the server.

Live Projections

Using Includes is very useful, but sometimes we want to do better than that, or just more than they can offer. The Live Projection feature is unique to RavenDB, and it can be thought of as the third step of the Map/Reduce operation: after done with mapping all data, and it has been reduced (if the index is a Map/Reduce index), the RavenDB server can transform the results into a completely different data structure and return it back instead of the original results.

Using the Live Projections feature, you get more control over what to load into the result entity, and since it returns a projection of the actual entity you also get the chance to filter out properties you do not need.

Lets use an example to show how it can be used. Assuming we have many User entities, and many of them are actually an alias for another user. If we wanted to display all users with their aliases, we would probably need to do something like this:

    // Storing a sample entity
    var entity = new User {Name = "Ayende"};
    session.Store(entity);
    session.Store(new User {Name = "Oren", AliasId = entity.Id});
    session.SaveChanges();
    
    // ...
    // ...
    
    // Get all users, mark AliasId as a field we want to use for Including
    var usersWithAliases = from user in session.Query<User>().Include(x => x.AliasId)
                           where user.AliasId != null
                           select user;
    
    var results = new List<UserAndAlias>(); // Prepare our results list
    foreach (var user in usersWithAliases)
    {
    	// For each user, load its associated alias based on that user Id
    	results.Add(new UserAndAlias
    	            	{
    	            		UserName = user.Name,
    	            		Alias = session.Load<User>(user.AliasId).Name
    	            	}
    		);
    }

Since we use Includes, the server will only be accessed once - indeed, but the entire object graph will be sent by the server for each referenced document (user entity for the alias). And its an awful lot of code to write, too.

Using Live Projections, we can do this much more easily and on the server side:

    public class Users_ByAlias : AbstractIndexCreationTask<User>
    {
    	public Users_ByAlias()
    	{
    		Map = users => from user in users
    		               select new {user.AliasId};
    
    		TransformResults =
    			(database, users) => from user in users
    								 let alias = database.Load<User>(user.AliasId)
    								 select new { Name = user.Name, Alias = alias.Name };
    	}
    }

The function declared in TransformResults will be executed on the results on the query, and that gives it the chance to modify, extend or filter them. In this case, it lets us look at data from another document and use it to project a new return type.

Since each Live Projection will return a projection, you can use the .As<> clause to convert it back to a type known by your application:

    var usersWithAliases =
    	(from user in session.Query<User, Users_ByAlias>()
    	 where user.AliasId != null
    	 select user).As<UserAndAlias>();

The main benefit of using Live Projections is having to write much less code, and the fact that it will run on the server and reduce a lot of network bandwidth by returning only the data we are interested in.

An important difference to note is that while Includes is useful for explicit loading by id or querying, Live Projections can be used for querying only.

Summary

There are no strict rules as to when to use which approach, but the general idea is to give it a good thought, and consider the various implication each has.

As an example, in an e-commerce application product names and prices are actually better be denormalized into an order line object, since you want to make sure the customer sees the same price and product title in the order history. But the customer name and addresses should probably be references, rather than denormalized into the order entity.

For most cases where denormalization is not an option, Includes are probably the answer. Whenever a serious processing is required after the Map/Reduce work is done, or when you need a different entity structure returned than those defined by your index - take a look at Live Projections.

Comments add new comment

The comments section is for user feedback or community content. If you seek assistance or have any questions, please post them at our support forums.

REPLY Posted by J. Amiry on Monday, 30 January 2012, 23:44

So, if we delete a document by some references to other documents, what happen to them? e.g. when delete a 'User', associated aliases will be deleted too?

REPLY Posted by Itamar Syn-Hershko on Monday, 30 January 2012, 23:48

No, but you can setup something like this with the cascade deletes bundle. See here: (ravendb.net/bundles/cascade-delete)[old.ravendb.net/bundles/cascade-delete]

REPLY Posted by J. Amiry on Tuesday, 31 January 2012, 00:32

Thanks Itamar. But another question: I have 3 entity like these:

public class Info {
    public string Id { get; set; }
    public string Something { get; set; }
    public Member Member { get; set; }
}

public class Member {
    public string Id { get; set; }
    public string InfoId { get; set; }
    public List&lt;Info&gt; Infos { get; set; }
    public List&lt;School&gt; Schools { get; set; }
}

public class School {
    public string SchoolId { get; set; }
    public List&lt;Member&gt; Members { get; set; }
}

A one-to-many relationship from 'Member' to 'Info' A many-to-many from 'Member' to 'School'

What about them? How can I work with them? For example, when I create a new 'Member', and do something like this:

var member = new Member();
member.Info = new Info();

what happens really? Can I load the 'Info' later, without the 'Member' -Load-by-infoid?

I think you must explain more about relationships. A single article like this one -in this page- is not really enough. I'm so excited on RavenDb, but there is not enough information and documentation about it. I hope I can learn and understand it. So thanks.

REPLY Posted by Itamar Syn-Hershko on Tuesday, 31 January 2012, 00:45

The entire object graph will be saved, so he Member document will contain an Info object in an attribute callled Info. To read Info, you will need to load the Member object containing it. Just like you'd do when storing it in an in-memory collection like a Dictionary.

"Relations" in RavenDB are really just a logical concept - the database doesn't know about them at all. When instead of an object you save a reference to it (it's ID, as a string attribute, e.g. string InfoId instead of Info Info) you can leverage Includes and Live Projections to make this decision more transparent.

Deciding which way to go is really a matter of understanding and experience. Ping us on the mailing list and we'll be glad to give you more specific help.

REPLY Posted by J. Amiry on Tuesday, 31 January 2012, 01:31

Thanks again. I can't find your mailing list! ):

REPLY Posted by Ido Ran on Saturday, 10 March 2012, 12:30

Includes You show example how to use Includes to fetch related document. Can we use this method recursively to fetch all orders - and their customers - and their favorite items - and their statistics, etc?

In an application I'm working on (which I evaluate changing to RavenDB) I have some screens that I know up-front I need some data, so instead of lazy load it from the server which will generate many request and very slow experience I use custom server that I can ask it to return the entier data set, that's where the question came from.

REPLY Posted by Ayende Rahien on Sunday, 11 March 2012, 12:29

No, Includes are NOT recursive. This is by design. The sort of scenario that you are talking about can be done using a Live Projection in the index, but I would be surprised if you actually needed to do that, to be frank. Modeling the data properly will allow you to do this without needing recursive queries.

REPLY Posted by Ido Ran on Tuesday, 13 March 2012, 16:01

Hi, Thanks for the answer. I agree that most scenarios do not require recursion. It may be side-effect of the fact that I'm using ORM behind RESTful web service which lead me to optimize the RESTful service by ensure I request a complete data structure once and not go back-and-forth for each data type. Another reason I model my data that way is because some screens need specific data which others required composition of multiple items. I do not want to get the entier data for the former screen, nor be have to do multiple requests for the later so I need to be able to query data with specific resolution.

REPLY Posted by Ayende Rahien on Tuesday, 13 March 2012, 19:13

You can do that using lazy requests, includes, etc. ayende.com/blog/63491/ravendb-lazy-requests

add new comment

SUBMIT COMMENT