Mixpanel + Ruby on Rails = Wonderful

My name is Patrick McKenzie and I am an analytics junkie. Hiya, Patrick! I have an almost unhealthy interest in how my business' web site (you're on it) is performing on a day to day basis. I say "almost" unhealthy because good analytics leads to good decisionmaking and good decisionmaking makes me money, and money is generally healthy, or at least helps me buy more tofu.

Recently I've added a new analytics service (Mixpanel) to the suite I already use (Google Analytics, CrazyEgg, and Clicky) and I have been so happy with it I had to share. This article explains:

  • a bit of background about analytics
  • how I integrated my Rails site with Mixpanel (steal my code)
  • how I used funnel data to improve the experience of my customers
  • (and why that makes me money)

Show/hide Table of Contents



spacer

Why You Should Use Analytics

The fundamental reason you want to use analytics is because it lets you know how your customers interact with your site and that this drives decisionmaking about your site design, future business directions, and the like.

You might say "Wait, why don't I just talk to customers?" You should talk to your customers, but customers:

  • have more important things to do than talk to you
  • are not experts about your site
  • misremember things
  • resent being tested on like rats in a twisted science experiment ("Would you hit this bar for cheese more if it was green, or blue?")
Analytics has none of these problems. The data you collect:
  • is always there, waiting for your analysis
  • is totally expert on use of your site
  • is as accurate as your collection method
  • has no issue with answering arbitrary, inane questions

Back to the Table of Contents


spacer

What To Track

The mad scientist in me wants to say Track it all! while cackling madly, but over the years I have found that having too much data (cough Google Analytics cough) is almost as bad as having none, because it numbs me to the import of the data. If it isn't driving good decisions, it is worthless. If I have to swim through a million and one screens to drill down to the information of interest to me, I don't do it very often, and data which is not read might as well not be collected.

So collect manageable amounts of data and prioritize your collection around key business goals. For example, if you sell stuff, then selling stuff is a core business goal. (If you don't sell stuff, get your Web 2.0 whippersnapper feet off my lawn and don't come back until you sell stuff.) The usability of the core interaction between your users and your web application is a core business goal. Focus your attention on places where change moves the needle, not on irrelevancies like "Does my blog archive page for August 2006 get a higher click through rate to the category page when the text is fuscia or purple?"

Back to the Table of Contents


spacer

How To Track

To the maximum extent possible, use pre-existing solutions rather than writing your own analytics code. I've done both, and believe me, the engineering effort to collect, record, and display custom statistics is immense.

For example, I have published some statistics about my business for the last few years, and that is all custom code. That custom code is roughly 10% of my code base. While I don't regret having written it, that 10% could have been spent on new features to impress customers. Only use custom analytics code when you have compelling business requirements for it.

By comparison, integrating pre-existing web analytics packages is easy. Google Analytics and CrazyEgg require copy/pasting perhaps 5 lines into your most general template. Mixpanel took a bit more work, but much less than custom code, and since I already did it for you you don't really have to worry about it.

Use the package optimized for the report data you want to see. For example:

  • Google Analytics tracks comprehensive web stats across time and visitor source very well.
  • CrazyEgg produces very visual, very actionable reports about interactions with particular pages' UI.
  • Mixpanel tracks arbitrary events and funnels very, very well.

Back to the Table of Contents


spacer

How Mixpanel is Different

Unlike "copy/paste and you're done" analytics solutions, Mixpanel lets you programmatically define what interests you via their API. This increases setup time versus copy/pasting a line of Javascript, but it gives you very fine control over integration with your site and software, and moves the data filtering from analysis time to design time. Since you'll be looking at reports a heck of a lot more than you'll be identifying new things to look at, that will save you time overall.

Mixpanel tracks events, not page views. (If you're coming from a traditional analytics package like Google Analytics, you're probably used to thinking exclusively in terms of page views. In GA, to track something like a click on a button, you create a virtual page view on a URL like /clicks/my-button, and then track hits on that virtual URL. No, really.) An event can be anything you want: a click on a button, a page view, a customer fulfilling some sort of condition ("This marked his 200th login"), etc.

Events can have properties. Everyone here a programmer? Good: properties are Map<String, String>. That is all you need to know about them. (This introduces a subtlety: you can stuff a number into the properties, but since it is operated on like it is a string, you can't currently e.g. calculate averages with it in their default reports.

[Editor's note: Some years after original publication, Mixpanel got a feature which changes this. See here.]

The fundamental form of analysis Mixpanel supports is, for a certain date range and event:

  • how many events occurred
  • what properties and what counts of those properties did the events have


Mixpanel also supports funnels. Essentially, they're a series of gates you pass through before you end up at a goal, with the percentage of people passing each gate recorded. Funnels are Mixpanel's killer app because they've got two ridiculously good pieces of default behavior for free:
  • users are only counted at each stage once
  • you can segment a funnel based on properties
By comparison, funnels in GA double-count users for every time they touch a stage in the funnel, and you have to do some very icky JS coding to segment them on axes not available to the browser by default. (For example, it is of keen interest to me how trial users interact with my website versus paying customers. I've tried tracking that in GA before. Suffice it to say that it is painful and I got unusable data even when it was working.)

Back to the Table of Contents


spacer

API Access Considerations

To track with Mixpanel, you make HTTP POSTs to particular URLs with parameters set a certain way. If you're interested in the technical details of what variable goes where, go read them. After you understand them, you can understand the following design considerations:

Access the API ascychronously: If you make an HTTP POST while rendering a page for a live user, and wait on the result, you're causing them to wait for something that does not add value to them. Don't do that. In particular, if the Mixpanel API should hiccup, you could potentially be waiting several seconds for it to timeout. Instead, fire and forget API requests.

My implementation uses DelayedJob for this: rather than having a Mongrel process (busy serving users) do the API request, I schedule it as a DelayedJob. A DelayedJob worker will, several seconds after scheduling, see the API request in the queue, fire it off, and if either my code or Mixpanel bugs out he won't care, he'll just retry again later. (An interesting side effect of this: if I throw an exception because my tracking code is bad, that doesn't cause the web page to show the user an error. I wouldn't suggest relying on this, but it is a nice safety net to have if you also push changes to production at 2 AM in the morning.)

Assigning unique IDs: The naive approach to assigning unique IDs to users is to use their IP address or associate a unique ID with their account. However, users frequently access the same site from different IPs (school and home, for example). One IP address may represent hundreds of users -- there is an AOL gateway in Kansas which accesses my site several dozen times a week, and I was honestly very concerned that I had an addicted schoolmarm before figuring that out. Assigning IDs to accounts is a good solution, and trivial in Rails (you get @user.id for free, after all), but if you want to track people across a signup you have to be tricky.

Assigning unique IDs is important in Mixpanel for funnel tracking. If you report two different stages of a funnel with two different unique IDs, Mixpanel will assume they are unrelated and junk the second data point.

Track durable data yourself: Mixpanel's Javascript implementation has a concept called Super Properties, which is an abstraction for a per-user cookie. They're automatically added as properties to all events and funnels for that user. To do this with their API, you have to track the data you want to persist yourself. In Rails, you can stuff it in the session, the cookie, Memcached, the database, wherever. For my particular needs I put most of it in the session, which since I use the CookieStore is really just a convenient way to set a cookie anyhow. However, you have a lot of flexibility to grab different bits from different places.

Back to the Table of Contents


spacer

OSS Mixpanel Rails Code

I split the Rails code for Mixpanel into two places. The first, which you can just copy/paste into your own project (remember to replace your API key), encapsulates the Mixpanel API in a model mixpanel.rb. This encapsulation keeps the details of the API away from your code in case they change later, and also means that you can conveniently serialize Mixpanel API requests as DelayedJobs (which is automatic if you have it installed).

This code is available under the MIT license, just like Rails. Go nuts. If you'd rather read it in your IDE, click the little icon in the top-right corner to copy/paste.

Show/hide code for mixpanel.rb

Then, I added a bit of code to my ApplicationController (application.rb) to give me a quick and easy wrapper around the API anywhere. You'll want to write your own code here specific to your own needs for associating IDs with users and automatically initializing properties for them. I'll talk you through it.

Show/hide code in application.rb

Then, you actually track stuff in controllers:

  #Logged in users come here first.
  def dashboard
    #Snipped business logic.
    #Note I track how many word lists users have
    #across all events/funnels as it makes
    #for a quick proxy on how heavy of a user they are.
    log_event("Dashboard")
    log_funnel("Basic Use", 1, "Dashboard", {}, {:word_list_count => @word_lists.size})
    log_funnel("Purchase", 1, "Dashboard", {}, {:word_list_count => @word_lists.size})
  end
  

Easy peasy! Adding a new event or funnel is a quick one-liner in the appropriate place(s).

Back to the Table of Contents


spacer

How Mixpanel Improved Bingo Card Creator

Alright, geek hats off, businessman hats on: how does that work make me more money? Well, I'll give you an example from my business: I recently released an online version of the application I sell.

<tangent>I am thrilled about selling online applications versus selling desktop software, for a few reasons. One is that you get opportunities to look at aggregate data from your users and everyone is pretty much OK with this, whereas phoning home from downloadable software is generally frowned on. It occurs to me, though, that I could just as easily have plugged Mixpanel into my downloadable application... I'll have to think that one over. </tangent>

Previous experience with customers over the last three years suggests that nothing sells my software as well as printed bingo cards actually in their hand. So I'm going to try to optimize the percentage of trial users who successfully complete that experience. If 1% more trial users get through this, I will get approximately 1% better sales, so there is clear motivation to get as many through this as possible.

Back to the Table of Contents


spacer

Define the Funnel

One excercise we used to do in my writing classes is to write instructions for making a peanut butter sandwitch. You may think making a peanut butter sandwitch is easy. If you do, you would make for a poor technical writer. Take a few seconds and sketch out all the steps you would need. Go ahead, I'll wait.

Did you forget to take the bread out of the bag? Start over. Did you forget to mentionwhere the peanut butter came from (pantry, etc)? Start over.

When you're instrumenting a funnel, you want that level of detail, because it will show you the stupid little things that trip up your customers that you would otherwise miss. You know why you miss them? Because this app is your baby and you have been working on it nonstop for months and your fingers walk through your core interaction without you ever engaging in conscious thought.

Thus, here is a moderate level of detail about the core interaction for Bingo Card Creator:
  1. Sign up (or sign in).
  2. Create a word list (or open an existing one).
  3. Add at least 24 words and.
  4. Save the word list.
  5. Customize the bingo cards.
  6. Tell the server to whip you up a batch.
  7. Download the resulting PDF file.
  8. Print it. (Happens within Adobe Reader so I can't see it.)
Tracking the funnel of events in Mixpanel lets me see exactly what steps are causing users to fail. Since all the steps are multiplicative (50% dropoff in step 1 and 50% dropoff in step 2 means only 25% of users survive to step 3), a 1% improvement anywhere increases overall funnel success by 1%. Since this funnel is essentially a step on the Pay Patrick Money funnel, a 1% improvement anywhere means 1% added straight to my bottom line.

So I instrumented all of the above actions, and sat back while my users interacted with the site.

Back to the Table of Contents


spacer

Initial Funnel Results

I really like Mixpanel's presentation of funnel results, because it makes the multiplicative nature of funnel performance very explicit. However, I don't have their CSS skills so unless you want to read it in the screenshots ( one, two) you get to see a text version:

  • Users signing in: 260
  • ... who created or opened a word list: 218 (83%)
  • ... and saved it at least once: 191 (87%)
  • ... and proceeded to the customization page: 161 (84%)
  • ... and customized and created cards: 131 (81%)
  • ... and actually downloaded those cards: 126 (95%)
Note that a long funnel with every step having 80%+ completion ends up in a total funnel completion rate a hair under 50%. Yikes, right? (Believe it or not, this is a wonderful number for me. 50% of my users are actually using the software and succeeding with it. For downloadable software, between everything that can go wrong in downloading, installation, and running the software, PLUS all that rigamarole, my number is much lower. I'm not sure how much lower, but the oft-repeated shareware industry average conversion rate of 1% is low for a reason.)

Back to the Table of Contents


spacer

Data Driven Improvements

This lets me prioritize on improvements. Clearly, optimizing the download step is a waste of my time: almost everyone already succeeds at it. The saving and proceeding customization steps are sort of difficult by nature -- they actually require the user to type significant amounts of text. I have ideas for improving that, but they are not the low hanging fruit.

The lowest hanging fruit is the very first step. I lose 20% of my trial accounts before they even open a word list. That is one freaking click into the funnel! So one thing for me to test this week is tweaking the dashboard and the introductory text on it to make it more obvious how to proceed -- probably with big, colorful buttons. Another option is to take people directly to the next step of the funnel on signup, and let them see the dashboard for the first time later. I may test that as well.

The other low hanging fruit is at customization. This asks the customer to look at a fairly involved form, but all they really HAVE to do is click a button. I should probably hide some of the complexity from the user, and make that button a graphic with an arrow on it rather than the current HTML form submit button. For the moment I've improved it via hiding the complexity for some users. (I use a quick heuristic to decide who is likely to be a power user. They get to see the whole form by default. Everyone else gets it hidden by default.) I'll have to get in touch with my design guy to get a decent button done up, but that will do for now.

You might be wondering "Hey, shouldn't this be A/B tested?", and if you're wondering that you are absolutely correct. However, my Rails A/B test framework isn't quite ready for pushing live yet. That will be my next little mini-project.

Back to the Table of Contents


spacer

Results of Improvements

I want to reiterate that this experiment is not optimal, because I did it before my A/B testing framework (A/Bingo) was ready. That out of the way: the experiment was a success. You can see the new funnel in screenshots: one, two) but the highlights are:

Completion on Create List went from 87% to 97%, probably due to improvements in my welcome page text. (Rather than letting people figuring out what to do next, I lead them by the freaking nose. For example, if they came from an ad for Baby Shower bingo cards, I make it very obvious how to get them.)

Completion of the Customization screen went from 81% to 88%, probably due to the simplification I described above.

These two changes taken together mean total funnel completion was lifted by 16.9%, from 48.5% to 56.7%. Not bad for an hour of work, huh? I will say it again: because these tests were run sequentially rather than A/B tested, I can't be positive that I'm not seeing the results of a different customer mix rather than the results of the tested experiments only. Additionally, smart readers might wonder "Is that improvement statistically significant?" If you consider the whole funnel as the experiment, then yes, the improvement is statistically significant at the 90% confidence level.

(If you aren't comfortable running your own Z-tests I recommend the SEOBook A/B test calculator for that sort of math. Full disclosure: I moderate in their forums.)

I'd love to hear what you think of this article. Either shoot me an email at patrick@bingocardcreator.com, post a comment on my blog, or post something about this and I'll find you from my referrer logs.

Regards,
Patrick McKenzie

P.S. Feel free to pass this article or code around to people you think will benefit from it.

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.