Mochi Labs

Developers at Mochi Media share their source and expertise.


meck and eunit best practices

spacer David Reid (@dreid) on erlang, testing, mochi

Posted on:

At this point, after 3 years at Mochi, I've written a significant amount of Erlang code with nearly 100% line coverage. I've learned over those 3 years that many Erlang programmers think mock libraries mean your code is being written wrong. However having not seen a significant amount of well tested code that doesn't use them I find them to be quite useful. So I'm not here to argue why you should use a mock library (in particular meck) in your Erlang project. I assume you've already made the decision to use it to test your code and I'm merely here to provide you with some useful tips for how to get the most out of it when using eunit.

Our example code

Here is a funciton called get_url/2 which takes a url as a string, and a list of options. It then makes an HTTP request, through a proxy using lhttpc. I wrote this code last week.

%% @spec get_url(string(), list()) ->
%%    {ok, {{pos_integer(), string()},
%%          [{string(), string()}], string()}} | {error, term()}.
get_url(Url, Options) ->
    TimeoutMSec = proplists:get_value(
                    timeout_msec, Options, ?DEFAULT_TIMEOUT_MSEC),

    {Host, Port, _Path, SSL} = lhttpc_lib:parse_url(URL),

    {ok, {{Code, Reason}, Headers, Body}} =
        lhttpc:request(
          ?PROXY_HOST,
          ?PROXY_PORT,
          SSL,
          URL,
          "GET",
          [?USER_AGENT,
           {"Proxy-Connection", "Keep-Alive"},
           {"Connection", "Keep-Alive"},
           {"Host", lists:flatten([Host, $:, integer_to_list(Port)])}],
          [],
          TimeoutMSec,
          []),

    case Code of
        200 ->
            {ok, {{Code, Reason}, Headers, binary_to_list(Body)}};
        C when C >= 400, C < 600 ->
            {error, {http_error, Code, Reason, URL}}
    end.
To test this code we want to:
  1. mock the lhttpc module
  2. set an expectation about the lhttpc:request/9 function.
  3. run get_url and assert on it's result.
  4. validate our expectations for the lhttpc module
  5. unload the mocked lhttpc module.

Our first test case

Here is a test that verifies that when we get a 404 from the lhttpc module we return an appropriate error tuple.

get_url_error_test() ->
    meck:new(lhttpc),
    meck:expect(lhttpc, request, 9,
                {ok, {{404, "Not Found"}, [], <<"Not found">>}}),
    ?assertEqual({error, {http_error, 404, "Not Found", "foo.com"}},
                 get_url("foo.com", [])),
    meck:validate(lhttpc),
    meck:unload(lhttpc).

Now, you might be looking at that and saying "this is a bad test" and you'd be right. This test does just about everything wrong except the actual asserting the result.

Poorly isolated

If get_url is changed such that an exception is raised or the assertion fails the lhttpc module will not be unloaded and future tests calling meck:new will fail immediately and somewhat violently. And here is how most people would immediately rewrite it.

get_url_error_test() ->
    meck:new(lhttpc),
    meck:expect(lhttpc, request, 9,
                {ok, {{404, "Not Found"}, [], <<"Not found">>}}),
    try
        ?assertEqual({error, {http_error, 404, "Not Found", "foo.com"}},
                     get_url("foo.com", [])),
    after
        meck:validate(lhttpc),
        meck:unload(lhttpc)
    end.

This fixes the isolation problem, but it's still suboptimal, mostly because obviously this code is going to have other test cases and they're all going to have to mock lhttpc and unload lhttpc. What we actually want do here is use what eunit refers to as fixtures. Specifically the fixture we're interested in is called foreach and it takes the following form.

{foreach,
 fun() -> %% A setup function
     %% I'll be run once before each test case.
     ok
 end,
 fun(SetupResult) -> %% A cleanup function
     %% I'll be run once after each test case, even if it failed.
     ok
 end
 [
   %% I'm a list of simple test functions.
 ]}.

We could also use the setup fixture if we only wanted the setup and cleanup functions to be run once for all of our test cases. However foreach provides greater isolation and I think it should be preferred to setup almost always.

So let's rewrite our tests.

get_url_test_() ->
    {foreach,
     fun() ->
             meck:new(lhttpc)
     end,
     fun(_) ->
             meck:unload(lhttpc)
     end,
     [{"Handles error codes",
       fun() ->
               meck:expect(lhttpc, request, 9,
                           {ok, {{404, "Not Found"}, [], <<"Not found">>}}),
               ?assertEqual({error,
                             {http_error, 404,
                              "Not Found", "foo.com"}},
                            get_url("foo.com", [])),
               meck:validate(lhttpc)
       end}]}.

This is a much better test. First we've started to give ourselves a framework for adding more test cases which have the same setup and cleanup steps.

We start by converting our simple test function to a test generator by adding changing the test name from get_url_error_test() to get_url_test_(). This is required to use fixtures if you don't use a test generator you'll see a single test case that always passes because your test code isn't actually being run. The foreach tuple will just assume to be a successful result of your test case.

We've then moved creating of the mock module to our setup function. And unloading of the mock module to our cleanup function. We've also for good measure pattern matched on the return value of unload. So if the unload fails, this test case fails with an error. Otherwise we might just get a stranger failure later on when we try to create a new mock of the lhttpc module.

This example also uses a title, notice how the 4th element of the foreach tuple is actually a list of 2 tuples? That first string element is actually the title and will be displayed in the test output. Like this:

simple_client: get_url_test_ (Handles error codes)...[0.006 s] ok

Without that title you'd just see this:

simple_client: get_url_test_...[0.0006 s] ok

And once you have more than one test case you'll be stuck deducing which test failed from the error message or just a line number.

Doesn't assert validation

This is a very common mistake, especially with code ported from using effigy, and it is potentially the most dangerous problem because it can cause bad tests to pass.

Most people assume that meck:validate/1 will raise an exception if it fails, however it actually returns a boolean, and if you're not checking this value your test might seem to pass when they shouldn't. For the test here this isn't a problem, if the expectation wasn't met then there is almost no way that the return value of get_url would be correct. However for tests which assert that some event is triggered or when assertions are necessarily done in the body of expectations the potential for a missed assertion to cause a false pass is a very real danger.

Lets look at the fixed example:

get_url_test_() ->
    {foreach,
     fun() ->
             meck:new(lhttpc)
     end,
     fun(_) ->
             meck:unload(lhttpc)
     end,
     [{"Handles error codes",
       fun() ->
               meck:expect(lhttpc, request, 9,
                           {ok, {{404, "Not Found"}, [], <<"Not found">>}}),
               ?assertEqual({error,
                             {http_error, 404,
                              "Not Found", "foo.com"}},
                            get_url("foo.com", [])),
               ?assert(meck:validate(lhttpc))
       end}]}.

Here we've used ?assert to check the return value of meck:validate/1. We could have used pattern matching but in general the ?assert* macros provide nicer error messages.

Complete Example

Here is a complete example which includes multiple tests and use of meck:expect/3 for specifying a fun which can pattern match it's arguments as well as assert the expected values.

get_url_test_() ->
    {foreach,
     fun() ->
             meck:new(lhttpc)
     end,
     fun(_) ->
             meck:unload(lhttpc)
     end,
     [{"sends user-agent",
       fun() ->
               meck:expect(lhttpc, request,
                           fun(_Host, _Port, _Ssl, _Url, _Method,
                               Headers, _Body, _Timeout, _Options) ->
                                   ?assert(lists:member(
                                       ?USER_AGENT, Headers)),
                                   {ok, {{200, "OK"}, [], <<"OK">>}}
                           end),

               ?assertEqual({ok, {{200, "OK"}, [], "OK"}},
                            get_url("foo.com", [])),
               ?assert(meck:validate(lhttpc))
       end},
      {"uses proxy",
       fun() ->
               meck:expect(lhttpc, request,
                           fun(?PROXY_HOST, ?PROXY_PORT, _Ssl,
                               "foo.com", "GET", Headers,
                               [], ?DEFAULT_TIMEOUT_MSEC, []) ->
                                   ?assert(lists:member(
                                       {"Host", "foo.com:80"}, Headers)),
                                   {ok, {{200, "OK"}, [], <<"OK">>}}
                           end),

               ?assertEqual({ok, {{200, "OK"}, [], "OK"}},
                            get_url("foo.com", [])),
               ?assert(meck:validate(lhttpc))
       end},
      {"Handles error codes",
       fun() ->
               meck:expect(lhttpc, request, 9,
                           {ok, {{404, "Not Found"}, [], <<"Not found">>}}),
               ?assertEqual({error,
                             {http_error, 404,
                              "Not Found", "foo.com"}},
                            get_url("foo.com", [])),
               ?assert(meck:validate(lhttpc))
       end}]}.

Conclusion

meck and eunit are very powerful but often poorly understood tools. There are not a lot of complete examples of how to write good tests for either of them or for erlang in general. So hopefully this has been helpful to you. As usual, the best recommendation is Read The Docs. If the docs fail you Read The Source. If the source fails you Ask The Internet.

You may also enjoy this fantastic presentation by meck's author Adam Lindberg speakerrate.com/talks/7749-meck-at-erlang-factory-london-2011 and another one by my boss Bob Ippolito etrepum.github.com/erl_testing_2011/.

Complaints?

This document represents an attempt to document my personal style of writing tests and some of the reasoning behind them. It is not complete, and many of my recommendations are probably a matter of personal taste. If you have a valid argument against them, or feel like I missed a very important recommendation. Please let me know in the comments or via twitter (I'm @dreid).

If you disagree with things I've said here strongly do not hesitate to tell me that. If you disagree with me very strongly, consider applying for a job writing erlang at Mochi: bit.ly/mochijobs

Read more View Comments

A Gentleman's Agreement on Privacy.

spacer David Reid (@dreid) on dnt, privacy, mochi

Posted on:

I'm a bit of a privacy nerd. I like privacy obviously, but I'm also interested in how technology and business can enable privacy and not hinder it. Now, Mochi is among other things, an advertising network, and like most advertising networks we use a unique cookie to keep track of users (specifically we store a random 128-bit integer in a flash Local Shared Object). So while we don't currently do any behavioural advertising it is not outside of the realm of possibility. It is with that in mind that I'd like to announce that as of 2011-05-31 our advertising product honors the DNT header.

A brief overview of Do Not Track and it's goals is best found at donottrack.us/ but here is the short version:

Do Not Track is a technology that enables users to opt out of third-party web tracking, including behavioral advertising. At present a user cannot opt out of many of the hundreds of tracking services and advertising networks; those that do allow opting out each require setting (and not deleting!) an opt-out cookie. Much like the popular Do Not Call registry, Do Not Track provides users with a single, persistent setting to opt out of web tracking.

What does this mean to you? Probably very little. However if you're a user who wants control over how information about you is used, and you use one of the major browsers that supports the DNT header (Firefox 4, IE9, Safari 5.1) it means that you can tell Mochi that you want to opt-out of being tracked while you play flash games, and we'll honor that wish.

When the DNT header is present Mochi will not assign your browser a unique cookie and will not log the last octet of your IP address. This greatly reduces the uniqueness of the fingerprints you leave behind as you play flash games using our ads across the web.

In the Local Shared Object we still store some information about the ads you've seen, mostly the ads you've seen today, this week, and this month. We use this to make sure you don't have to see the same ad too often in a short period of time. We also store the timestamps of the last 3 times you've seen an ad. All this information is stored locally and can be controlled by Adobe's Global Storage Settings and Website Storage Settings.

In our server side logs we store information, including the first 3 octets of your IP address, and the browser name and version (instead of the full user-agent string.) We believe this reduces the uniquness of an ad impression sufficiently to be true to the spirit of DNT. The IP address is used only for getting country, timezone, and region data out of an industry standard GeoIP database for targetting and forecasting.

Why DNT?

With support of the browser vendors and FTC interest in legislating some sort of Do Not Track implementation. As well as a California Senate Bill proposing Do Not Track requirements for businesses now seemed like an ideal time to step up and turn a personal interest in privacy into a requirement for our product. Because I wanted to. Because I feel it is important that companies help ensure the privacy of users instead of trying to make money out of usurping it.

Mochi happens to be a small company that lets individuals be individuals and express themselves in our products. Want to join us? bit.ly/mochijobs

Read more View Comments

statebox, an eventually consistent data model for Erlang (and Riak)

spacer Bob Ippolito (@etrepum) on erlang, mochi

Posted on:

A few weeks ago when I was on call at work I was chasing down a bug in friendwad [1] and I realized that we had made a big mistake. The data model was broken, it could only work with transactions but we were using Riak. The original prototype was built with Mnesia, which would've been able to satisfy this constraint, but when it was refactored for an eventually consistent data model it just wasn't correct anymore. Given just a little bit of concurrency, such as a popular user, it would produce inconsistent data. Soon after this discovery, I found another service built with the same invalid premise and I also realized that a general solution to this problem would allow us to migrate several applications from Mnesia to Riak.

When you choose an eventually consistent data store you're prioritizing availability and partition tolerance over consistency, but this doesn't mean your application has to be inconsistent. What it does mean is that you have to move your conflict resolution from writes to reads. Riak does almost all of the hard work for you [2], but if it's not acceptable to discard some writes then you will have to set allow_mult to true on your bucket(s) and handle siblings [3] from your application. In some cases, this might be trivial. For example, if you have a set and only support adding to that set, then a merge operation is just the union of those two sets.

statebox is my solution to this problem. It bundles the value with repeatable operations [4] and provides a means to automatically resolve conflicts. Usage of statebox feels much more declarative than imperative. Instead of modifying the values yourself, you provide statebox with a list of operations and it will apply them to create a new statebox. This is necessary because it may apply this operation again at a later time when resolving a conflict between siblings on read.

Design goals (and non-goals):

  • The intended use case is for data structures such as dictionaries and sets
  • Direct support for counters is not required
  • Applications must be able to control the growth of a statebox so that it does not grow indefinitely over time
  • The implementation need not support platforms other than Erlang and the data does not need to be portable to nodes that do not share code
  • It should be easy to use with Riak, but not be dependent on it (clear separation of concerns)
  • Must be comprehensively tested, mistakes at this level are very expensive
  • It is ok to require that the servers' clocks are in sync with NTP (but it should be aware that timestamps can be in the future or past)

Here's what typical statebox usage looks like for a trivial application (note: Riak metadata is not merged [5]). In this case we are storing an orddict in our statebox, and this orddict has the keys following and followers.

-module(friends).
-export([add_friend/2, get_friends/1]).

-define(BUCKET, <<"friends">>).
-define(STATEBOX_MAX_QUEUE, 16).     %% Cap on max event queue of statebox
-define(STATEBOX_EXPIRE_MS, 300000). %% Expire events older than 5 minutes
-define(RIAK_HOST, "127.0.0.1").
-define(RIAK_PORT, 8087).

-type user_id() :: atom().
-type orddict(T) :: [T].
-type ordsets(T) :: [T].
-type friend_pair() :: {followers, ordsets(user_id())} |
                       {following, ordsets(user_id())}.

-spec add_friend(user_id(), user_id()) -> ok.
add_friend(FollowerId, FolloweeId) ->
    statebox_riak:apply_bucket_ops(
    ?BUCKET,
    [{[friend_id_to_key(FollowerId)],
          statebox_orddict:f_union(following, [FolloweeId])},
     {[friend_id_to_key(FolloweeId)],
          statebox_orddict:f_union(followers, [FollowerId])}],
    connect()).

-spec get_friends(user_id()) -> [] | orddict(friend_pair()).
get_friends(Id) ->
    statebox_riak:get_value(?BUCKET, friend_id_to_key(Id), connect()).


%% Internal API

connect() ->
    {ok, Pid} = riakc_pb_client:start_link(?RIAK_HOST, ?RIAK_PORT),
    connect(Pid).

connect(Pid) ->
    statebox_riak:new([{riakc_pb_client, Pid},
                       {max_queue, ?STATEBOX_MAX_QUEUE},
                       {expire_ms, ?STATEBOX_EXPIRE_MS},
                       {from_values, fun statebox_orddict:from_values/1}]).

friend_id_to_key(FriendId) when is_atom(FriendId) ->
    %% NOTE: You shouldn't use atoms for this purpose, but it makes the
    %% example easier to read!
    atom_to_binary(FriendId, utf8).

To show how this works a bit more clearly, we'll use the following sequence of operations:

add_friend(alice, bob),       %% AB
add_friend(bob, alice),       %% BA
add_friend(alice, charlie).   %% AC

Each of these add_friend calls can be broken up into four separate atomic operations, demonstrated in this pseudocode:

%% add_friend(alice, bob)
Alice = get(alice),
put(update(Alice, following, [bob])),
Bob = get(bob),
put(update(Bob, followers, [alice])).

Realistically, these operations may happen with some concurrency and cause conflict. For demonstration purposes we will have AB happen concurrently with BA and the conflict will be resolved during AC. For simplicity, I'll only show the operations that modify the key for alice.

AB = get(alice),                              %% AB (Timestamp: 1)
BA = get(alice),                              %% BA (Timestamp: 2)
put(update(AB, following, [bob])),            %% AB (Timestamp: 3)
put(update(BA, followers, [bob])),            %% BA (Timestamp: 4)
AC = get(alice),                              %% AC (Timestamp: 5)
put(update(AC, following, [charlie])).        %% AC (Timestamp: 6)
Timestamp 1:
There is no data for alice in Riak yet, so statebox_riak:from_values([]) is called and we get a statebox with an empty orddict.
Value = [],
Queue = [].
Timestamp 2:
There is no data for alice in Riak yet, so statebox_riak:from_values([]) is called and we get a statebox with an empty orddict.
Value = [],
Queue = [].
Timestamp 3:
Put the updated AB statebox to Riak with the updated value.
Value = [{following, [bob]}],
Queue = [{3, {fun op_union/2, following, [bob]}}].
Timestamp 4:
Put the updated BA statebox to Riak with the updated value. Note that this will be a sibling of the value stored by AB.
Value = [{followers, [bob]}],
Queue = [{4, {fun op_union/2, followers, [bob]}}].
Timestamp 5:
Uh oh, there are two stateboxes in Riak now... so statebox_riak:from_values([AB, BA]) is called. This will apply all of the operations from both of the event queues to one of the current values and we will get a single statebox as a result.
Value = [{followers, [bob]},
         {following, [bob]}],
Queue = [{3, {fun op_union/2, following, [bob]}},
         {4, {fun op_union/2, followers, [bob]}}].
Timestamp 6:
Put the updated AC statebox to Riak. This will resolve siblings created at Timestamp 3 by BA.
Value = [{followers, [bob]},
         {following, [bob, charlie]}],
Queue = [{3, {fun op_union/2, following, [bob]}},
         {4, {fun op_union/2, followers, [bob]}},
         {6, {fun op_union/2, following, [charlie]}}].

Well, that's about it! alice is following both bob and charlie despite the concurrency. No locks were harmed during this experiment, and we've arrived at eventual consistency by using statebox_riak, statebox, and Riak without having to write any conflict resolution code of our own.

[1]friendwad manages our social graph for Mochi Social and MochiGames. It is also evidence that naming things is a hard problem in computer science.
[2]See Basho's articles on Why Vector Clocks are Easy and Why Vector Clocks are Hard.
[3]When multiple writes happen to the same place and they have branching history, you'll get multiple values back on read. These are called siblings in Riak.
[4]An operation F is repeatable if and only if F(V) = F(F(V)). You could also call this an idempotent unary operation.
[5]The default conflict resolution algorithm in statebox_riak chooses metadata from one sibling arbitrarily. If you use metadata, you'll need to come up with a clever way to merge it (such as putting it in the statebox and specifying a custom resolve_metadatas in your call to statebox_riak:new/1).
Read more View Comments

Introducing Mochi Labs

spacer Bob Ippolito (@etrepum) on mochi

Posted on:

Mochi Labs is an effort at Mochi Media to share our experiences developing and maintaining our platforms. We'll also be talking about our open source software, and other interesting research we've done. You should expect to see a lot about Erlang, Python, Flash, and all of the other technologies we're using or plan to use!

2011 is going to be a big year for Mochi. I can't wait to write more, we've got big plans and nearly six years of experience to share!

Read more View Comments

Article navigation

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.