Fast, easy, realtime metrics using Redis bitmaps

29 Tuesday Nov 2011

Posted by Chandra Patni in Engineering

≈ 48 Comments

At Spool, we calculate our key metrics in real time. Traditionally, metrics are performed by a batch job (running hourly, daily, etc.). Redis backed bitmaps allow us to perform such calculations in realtime and are extremely space efficient. In a simulation of 128 million users, a typical metric such as “daily unique users” takes less than 50 ms on a MacBook Pro and only takes 16 MB of memory. Spool doesn’t have 128 million users yet but it’s nice to know our approach will scale. We thought we’d share how we do it, in case other startups find our approach useful.

Crash Course on Bitmap and Redis Bitmaps

Bitmap (aka Bitset)

A Bitmap or bitset is an array of zeros and ones. A bit in a bitset can be set to either 0 or 1, and each position in the array is referred to as an offset. Operations such as logical AND, OR, XOR, etc. and other bitwise operations are fair game for Bitmaps.

Population Count

The population count of a Bitmap is the number of bits set to 1. There are efficient algorithms for calculating population count. For instance, the population count of a 90% filled bitset containing 1 billion bits took 21.1 ms on a MacBook Pro. There is even a hardware instruction in SSE4 for the population count of an integer.

Bitmaps in Redis

Redis allows binary keys and binary values. Bitmaps are nothing but binary values. The setbit(key, offset, value) operation, which takes O(1) time, sets the value of a bit to 0 or 1 at the specified offset for a given key.

A simple example: Daily Active Users

To count unique users that logged in today, we set up a bitmap where each user is identified by an offset value. When a user visits a page or performs an action, which warrants it to be counted, set the bit to 1 at the offset representing user id. The key for the bitmap is a function of the name of the action user performed and the timestamp.

In this simple example, every time a user logs in we perform a redis.setbit(daily_active_users, user_id, 1). This flips the appropriate offset in the daily_active_users bitmap to 1. This is an O(1) operation. Doing a population count on this results in 9 unique users that logged in today. The key is daily_active_users and the value is 1011110100100101.

Of course, since the daily active users will change every day we need a way to create a new bitmap every day. We do this by simply appending the date to the bitmap key. For example, if we want to calculate the daily unique users who have played at least 1 song in a music app for a given day, we can set the key name to be play:yyyy-mm-dd. If we want to calculate the number of unique users playing a song each hour, we can name the key name will be play:yyyy-mm-dd-hh. For the rest of the discussion, we will stick with daily unique users that played a song. To collect daily metrics, we will simple set the user’s bit to 1 in the play:yyyy-mm-dd key whenever a user plays a song. This is an O(1) operation.

redis.setbit(play:yyyy-mm-dd, user_id, 1)

The unique users that played a song today is the population count of the bitmap stored as the value for the play:yyyy-mm-dd key.To calculate weekly or monthly metrics, we can simply compute the union of all the daily Bitmaps over the week or the month, and then calculate the population count of the resulting bitmap.

You can also extract more complex metrics very easily. For example, the premium account holders who played a song in November would be:
(play:2011-11-01 ∪ play:2011-11-02 ∪...∪play:2011-11-30) ∩ premium:2011-11

Performance comparison using 128 million users

The table below shows a comparison of daily unique action calculations calculated over 1 day, 7 days and 30 days for 128 million users. The 7 and 30 metrics are calculated by combining daily bitmaps.

Period	Time (ms)
Daily	50.2
Weekly	392.0
Monthly	1624.8

Optimizations

In the above example, we can optimize the weekly and monthly computations by caching the calculated daily, weekly, monthly counts in Redis.

This is a very flexible approach. An added bonus of caching is that it allows fast cohort analysis, such as weekly unique users who are also mobile users — the intersection of a mobile users bitmap with a weekly active users bitmap. Or, if we want to compute rolling unique users over the last n days, having cached daily unique counts makes this easy — simply grab the previous n-1 days from your cache and union it with the real time daily count, which only takes 50ms.

Sample Code

A Java code snippet below computes unique users for a given user action and date.

import redis.clients.jedis.Jedis;
import java.util.BitSet;
...
  Jedis redis = new Jedis("localhost");
...
  public int uniqueCount(String action, String date) {
    String key = action + ":" + date;
    BitSet users = BitSet.valueOf(redis.get(key.getBytes()));
    return users.cardinality();
  }

The code snippet below computes the unique users for a given given user action and a list of dates.

import redis.clients.jedis.Jedis;
import java.util.BitSet;
...
  Jedis redis = new Jedis("localhost");
...
  public int uniqueCount(String action, String... dates) {
    BitSet all = new BitSet();
    for (String date : dates) {
      String key = action + ":" + date;
      BitSet users = BitSet.valueOf(redis.get(key.getBytes()));
      all.or(users);
    }
    return all.cardinality();
  }

Like this:

3 bloggers like this post.

48 thoughts on “Fast, easy, realtime metrics using Redis bitmaps”

Kenn Ejima said:

November 29, 2011 at 3:56 PM

This is possibly the best tech article I’ve seen in months. Thanks for sharing!

Reply
tomblobaum said:

November 29, 2011 at 5:24 PM

What do you use this for?

Reply
- avichal said:
  
  November 29, 2011 at 5:37 PM
  
  We use this for all of our metrics. Daily/weekly/monthly actives, recordings going through our system, how many users from a particular type of mobile device are using Spool, how this changes over time, how this varies over different cohorts.
  
  Reply
  - Shalin Mantri said:
    
    December 2, 2011 at 2:27 PM
    
    In the build vs. buy analysis, why did you build? And what analytics packages did you consider ‘buying’?
Anna Filina said:

November 29, 2011 at 5:57 PM

I’m already itching to apply this somewhere. Thanks.

Reply
- Chandra Patni said:
  
  November 29, 2011 at 8:06 PM
  
  Anna,
  Handy Gist on githubs to try it out: https://gist.github.com/1384211
  
  Reply
  - Jose Antonio (@j0c4nt0n10) said:
    
    January 31, 2012 at 8:27 AM
    
    Thank you This is great stuff
Anson Parker (@anson) said:

November 29, 2011 at 6:13 PM

I like the simplicity of this technique. It’s quite similar in some ways to how I index and look up the availability of 135 million domain names at Domize.

I guess the caveat would be that because the only values allowed are 0 or 1 it is only useful for things you want to count once, or not at all (i.e. uniques). Have you thought about how you could use a similar method to collate multiple visits/actions within a time window? e.g. the number of favourites a user creates in a given day.

Reply
- Chandra Patni said:
  
  November 29, 2011 at 7:48 PM
  Anson,
  There are couple of ways you can do it in redis.
  1. Use redis hash in conjunction with HINCRBY operation for a key favorite:yyy-mm-dd and field user_id.
  2. Use combination of Redis GETRANGE, SETRANGE. Suppose we chose 2 byte wide counters
  WATCH favorite:yyy-mm-dd count = GETRANGE favorite:yyy-mm-dd 2*user_id MULTI SETRANGE favorite:yyy-mm-dd 2*user_id (count +1).get_bytes EXEC
  If you don’t care about atomicity, you can drop WATCH, MULTI, EXEC. It’s similar to the approach used by counting bloom filters.
  
  Reply
  - randito1y said:
    
    January 5, 2012 at 10:34 AM
    
    Great reply, Chandra. So, essentially you extend the 1-bit to have a width (2 in your case).
John Biesnecker said:

November 29, 2011 at 6:44 PM

This is absolutely fantastic. We have about 700K users, and are finding our existing usage tracking systems are just not up to it. We’re absolutely going to try this.

Reply
Joe9 said:

November 29, 2011 at 7:49 PM

What is the max. length of bit you can hold (i.e. max number of users?)

Reply
- Chandra Patni said:
  
  November 29, 2011 at 8:02 PM
  
  Redis allows max offset of 2^32 -1. So, 4 billion users. At that scale, you would use shading and other tricks in a realistic scenario.
  
  Reply
  - taf2 said:
    
    November 30, 2011 at 5:19 AM
    
    if you’re interfacing with redis via ruby this might useful once you need to start sharding: https://github.com/taf2/redi
mattbillenstein (@mattbillenstein) said:

November 29, 2011 at 8:07 PM

How do you handle a set of rather sparse ids? Like Facebook ids or something which are quite large now (billions?) but you’re only likely to see a rather small set of those…

m

Reply
- mattbillenstein (@mattbillenstein) said:
  
  November 29, 2011 at 9:44 PM
  
  I guess the obvious solution is to map sparse ids to dense ones using some sort of auto increment field in your db of choice…
  
  Reply
  - jubos said:
    
    November 30, 2011 at 10:20 AM
    
    Matt, that’s a good solution. That’s what we do, and redis INCR command is very useful in this regard for doing auto increments on user registration.
Vijaya Sagar V said:

November 29, 2011 at 10:30 PM

Cool post. We’ve been using redis extensively (including bitmaps) and something like this makes a lot of sense for us to adopt/adapt.

Reply
Pingback: async I/O News » Fast, easy, realtime metrics using Redis bitmaps
Alex Gusev said:

November 30, 2011 at 12:01 AM

Hi. Maybe I misunderstood something.

128 millions of bits is nearly 15 megabytes. If last reigistered user listen some some tracks each day, he fills database with 15 megabytes values in database. Each of them represents only one listening.

In another hand, I heard that Redis likes when data fits memory.

Help me to deal with this thoughts. And thank you for this cool bitwise article

Reply
- Chandra Patni said:
  
  November 30, 2011 at 10:48 AM
  
  Alex,
  If you have 128 million users and only last few listens to song everyday, using Redis sets would work better. Bitmaps approach works well in a scenario like this: www.avc.com/a_vc/2011/07/301010.html
  Our schema design employs uuids as the primary id. Besides user primary id, we have an analytics id for each user. Analytics ids are used in bitmaps and also allow future optimizations relatively easy.
  
  Reply
Fernando Bevilacqua said:

November 30, 2011 at 2:17 AM

Great! Thanks for sharing such a great tech article. It’s nice to carry some tricks like that in our utility belt

Reply
withakay said:

November 30, 2011 at 2:24 AM

Ok,

possible dumb question here, but here goes…