m6d blog

m6d m6d blog

February 16th, 2012

Is Your Ad Effective? Why This is The Most Important Question for Attribution Measurement.

Posted by: Brian Dalessandro

Attribution is the certainly the topic du jour in online advertising (as it has been for several years). The good news for the industry is that firms are finally emerging to address the challenge. The call to move beyond the last-click is finally being answered! The problem, though, is that attribution measurement is fragmented and inconsistent (typical of any new industry). Many proposed solutions exist to the same problem, and the diversity tends to undermine the effort, and delay adoption of the best solution.

We have thought long and hard about this at media6degrees (m6d), and being scientists, we felt compelled to develop our own proposal for what attribution should look like. We postulated three core concepts for multi-touch attribution: 1. standardization, 2. ad effectiveness, and 3. goal alignment. This post is the start of several in a series that we will be putting out to answer the question: what should good attribution measurement look like? We will be discussing in more detail these three points in future posts. The whitepaper is a bit math-y, but the intuition is certainly there between the formulas. If that is not your style, this post and the posts to follow will cover the core concepts without getting to heavy on the technical jargon. Enjoy!

Attribution is measurement, and measurement should be standardized.

Imagine you are a busy ad industry executive (probably not too far off) and you have to juggle a schedule that involves sales calls, vendor meetings, internal strategic planning, investor sessions and the like. Now imagine that everyone you meet with defines an hour differently. Internally, an hour lasts 60 minutes, but your clients and investors (admittedly important people in your life) define an hour as 50 minutes and 75 minutes, respectively. Whether it’s you or your assistant who has to coordinate with everyone, wouldn’t it be easier if you could all agree that an hour lasts 60 minutes?

This is what the attribution landscape looks like right now. Some vendors offer heuristic attribution allocations based on position in the marketing funnel. Some offer differing statistical approaches. Many even have different definitions of what constitutes an ad exposure. Most parties tend to agree that last-touch/last-click attribution is flawed. But what is stopping us from moving beyond it full force? Lack of standardization seems to be the culprit. So let’s explore what standardization might look like.

Attribution is first and foremost an ad effectiveness problem.

Going back to our role-playing scenario, you, the busy ad exec, have to decide on compensation for your sales staff. You have one sales person who made lots of calls. You have another who made fewer calls, but brought in several seven figure accounts. You likely gave the bigger bonus to the sales person who brought in more revenue. This same compensation strategy should apply to attribution. A good attribution system, plain and simple, rewards advertising strategies that drive conversion. The tricky part is knowing who is driving conversions. This is why causal ad effectiveness measurement is a prerequisite for good attribution measurement. Whether it is through A/B testing (such as that proposed by Collective Media or statistical modeling (such as that done by m6d, attribution systems need to credit the strategies that create real value, and reward them accordingly.

Attribution is what ultimately aligns brand marketer and advertiser goals.

This brings us to our final point — incentives. Online advertising can fall victim to the principal-agent problem. Brand marketers contract advertising agencies and their vendors to serve ads on their behalf. The marketer wants ROI and the vendor wants credit. The oft-cited problem with last-touch attribution is that it motivates vendors to game the system, where they optimize towards their own credit allocation as opposed to creating value for the brand. This ultimately causes a misallocation of the the brand’s advertising budget. In this scenario, both the brand and the advertising ecosystem suffer as value-takers flourish at the expense of value-creators.

Every attribution system should be designed with incentive alignment in mind. By creating an attribution system that is based on commonly accepted principles, one of which being that attribution is at its core an ad effectiveness problem, a standard can finally emerge that serves to align the goals of brand marketers and the agencies and vendors serving them.

No Comments

February 10th, 2012

Time for Brands to Play Moneyball, too

Posted by: Penry Price

February is always the high point of the movie season. Oscar nominations are out and people are rushing to catch up on the movies they have yet to see. To date, I am 3 for 9. Not a very good average, but not bad if I were a left-handed power hitter.

Of the three I have seen, Moneyball really struck a chord. I liked it not only because I’m a life-long baseball guy and a fan of the book, but also because the message rang true for the current state of advertising.

One scene particularly resonated, in which Billy Beane (Brad Pitt), General Manager of the Oakland A’s, is speaking to Paul Podesta (Jonah Hill), his assistant GM, about the idea of using sabermetrics to draft players for their team.

BILLY Why–You’re not the only computer science major who likes baseball. If what you and Bill James are saying is right–
PAUL It’s right.

BILLY It sounds right.

PAUL It is right.

BILLY If math isn’t a theory–

PAUL It isn’t.

BILLY If this is right, why isn’t everybody doing it? In fact why isn’t anybody doing it?

PAUL Because it’s not what they were taught.

Marketers today are challenged more than ever to find new customers. The old ways of finding “prospects” – such as targeting with demographics – are growing as outmoded as thinking that a team shouldn’t draft a player because his girlfriend is ugly (honest to god, it’s in the movie). But it is the way we all have been taught. It’s also safe and comfortable.

Is it fair to assume that our traditional ways of finding new customer prospects are akin to scouts looking at batting average, home runs or slugging percentage? Are we using old and inferior techniques to solve for new problems? Can we apply advanced statistical analysis to find new customers for brands, much as the A’s found prospects in 1999? I, and many others, absolutely think so.

As with Billy Beane and his staff, marketers are now using data more than ever. In fact, we would all agree, there is more data available than we can usefully process. The mantra now has shifted from the amount of data collected to how it is utilized. How is that data put into action?

Much of that data is less valuable and actionable than we had expected. A browser that comes across a website with automotive reviews is not necessarily interested in buying a car, let alone a Ford. However, if that browser demonstrates certain web patterns, and it can be matched to other browsers who have proven to be strong Ford customers, then empirical evidence proves that it’s a great “prospect” for Ford.

So, what does this new world look like? The new coda is to target browsers that will work for your brand, not your competitor’s brand, not your product category, but your brand. The players that Billy Beane drafted for the A’s were drafted specifically to play a role for that team. They were valued for the contribution they could make to the A’s, and wouldn’t have worked for another team (think Scott Hatteberg post-A’s).

Our industry challenge is to find new customers, or prospects, that will engage with your company, brand, sub-brand, and even SKU. Why not pursue that challenge by finding prospects who have already shown the propensity to be interested in your brand? To put it simply, the techniques we have been using are not strong enough proxies for interest in a given brand with a specific appeal at a specific time online.

Our behaviors have changed dramatically as we have become more comfortable with this all-access anytime to anything world. Shouldn’t we adjust to those new behaviors and look for new ways to find our customers?

No Comments

November 1st, 2011

m6d CEO Tom Phillips Talks to Digiday’s Brian Morrissey

Posted by: Johanna Nisenholtz

“It’s systemic and there’s an evolution that we’re going through and it takes time…The whole media consumption pattern is totally different than what we’ve had in the past”
-Tom Phillips, m6d CEO
Part One:

digiday on livestream.com. Broadcast Live Free

Part Two:

digiday on livestream.com. Broadcast Live Free

No Comments

August 26th, 2011

Claudia Perlich – m6d Chief Scientist – Wins Prestigious KDD Award

Posted by: Brian Dalessandro

It is with great pride that we get to announce that our own Chief Scientist, Claudia Perlich, has once again taken home a coveted prize at this year’s ACM KDD conference. Her paper, “Leakage in Data Mining: Formulation, Detection and Avoidance,” co-authored with two exceptional statisticians at Tel-Aviv University, has won the best paper award at the 2011 KDD conference. The competition for this award was extraordinary — over 700 papers from many of the leading machine learning experts and data scientists worldwide.

Claudia is no stranger to winning contests at KDD, which is one of the world’s top data mining conferences, attended by both academia and top industry players (like Google, Yahoo, Microsoft, and now m6d!). She has actually won their annual data mining competition three times in the past, and now sits on the committee that administers the competition. Her current “best paper” is related to her experience winning these competitions. She and her colleagues offer a formal analysis on a common pitfall in data mining & statistical analysis called “Leakage.” According to the paper, “leakage is essentially the introduction of information about the data mining target, which should not be legitimately available to mine from.” In other words, information related to the data you are trying to predict has “leaked” into the data you are using to make the prediction. A trivial example that might illustrate this is as follows:

I am tasked to predict which prospects in a given pool will purchase a product online after being shown an ad for the product. As the modeler, I pull all recent ad impressions from the data, and I use publisher, time of day and last site visited as my predictors. I also pull in who has and hasn’t purchased, and who has and hasn’t visited the checkout page. Now for those who purchased, the last site visited was the checkout page of the product being purchased. If this was in my set of predictors, it would get a very high weight in my model, though in practice, this would be a useless model. It is not feasible to target based on someone being on the checkout page, because the checkout page is the event that, by design, always precedes a purchase.

The above is a somewhat trivial example, but leakage is not a trivial problem. As the paper points out, this problem has occurred in many data mining competitions, designed by highly qualified statisticians. Kudos to Claudia and team for discovering the issue in these competitions and calling attention to the persistence of the problem. It is refreshing to read a paper that offers practical guidance around such a subtle, but model-effacing, misapplication of proper modeling methodology.

Reflecting on how this relates to our work at m6d, I am thrilled to have such creative and intuitive colleagues. We face new modeling challenges all the time, especially in such a fast moving and vast ecosystem. Every algorithm starts with a team of people trying to solve a problem, and oftentimes these problems are so new that no textbook provides a how-to guide to designing an optimal solution. In these situations, the algorithms are only as good as the statistical craftsmanship of the people who designed and planned them. In our case, Claudia is one of the finest craftswoman in the field of data modeling and analysis, and so we are very fortunate to have her on our team. And to our customers … never should you fear that leakage will ever corrupt your next campaign’s performance!

5 Comments

August 16th, 2011

The Science Behind NICE

Posted by: Ori Stitelman

As a follow-up to Tom Phillips’s post on the marketing implications of our first implementation of Non-Invasive Causal Estimation (NICE), this post concerns why it is important and actionable to estimate causal effects using observational data. The NICE methodology is explained in our recently published paper, and the results presented in Tom’s post are an extension of that analysis. I will be presenting the findings of this paper at the 5th Annual International Workshop on Data Mining and Audience Intelligence for Online Advertising at this year’s KDD conference, the preeminent annual conference on data mining, which will be held this year in San Diego from August 21st to 24th.

Despite the fact that billions of dollars a year are spent on digital display advertising, little has been done to quantify the effect of such advertising on customer behavior. As far as we can tell, even less has been done to inform business decisions based on those estimates. Are the high conversion rates seen for subsets of browsers the result of choosing to display ads to a group that has a naturally higher tendency to convert, or does the advertisement itself cause an additional lift? How does showing an ad to different segments of the population affect their tendencies to take a specific action, or convert? By applying NICE methods, we are able to use the existing observed data to estimate the effect of display advertising on customer behavior, and assess the impact of potential advertising decisions. Several examples of this were presented in the prior post.

In brief, by using observational methods like NICE, one does not have to incur the significant costs associated with A/B testing, which include:

• The cost of displaying PSAs to the control group (untreated group).
• The overhead cost of implementing A/B tests and ensuring that they are done correctly. This is a significant cost, and several papers have been written addressing the issue (see e.g. Kohavi, 2010).
• The cost associated with waiting for results, and the resulting delay in critical decision-making.

Fortunately for the display advertising community we do not have to reinvent the wheel. Many statisticians, economists and other scientists have devoted years of research toward improving causal estimation in observational data. In fact, prior to joining Media6Degrees, I was part of a research group at U.C. Berkeley whose sole focus was developing optimal methods for estimating causal effects. The attached paper presents several causal methods that fall under the NICE umbrella, and explains how those methods may be used for estimating the causal effect of display advertising. One particular class of estimation methods, and the one we employ here, is called targeted maximum likelihood estimation (TMLE), and it stands above the others in terms of its ability to return both accurate and precise estimates of causal effects.

One of the major reasons that TMLE exhibits these robust properties is that all of the computation and heavy lifting it does is focused strictly on estimating the causal effect as well as possible. Though it could be assumed that this is the case for most estimators, this focus is rarely the case. In fact, the motivation for developing several of the other methods was how well they behave asymptotically, or in very large samples, with little concern for how they behave in finite samples (the case for real data). As a result, those methods tend to behave unreliably when estimating the causal effect of advertising (where conversion data is quite finite) and can even return probability estimates that do not fall between zero and one. Other alternative methods are concerned with estimating the overall distribution as well as possible, rather than focusing on the optimal estimation of effect (as TMLE does). These advantages of TMLE, our NICE method of choice, are further explored in the KDD paper.

I would also like to acknowledge two Meetup groups in the New York City area to which data scientists at Media6Degrees are regular contributors – the NYC Machine Learning group and the NYC Predictive Analytics group. In fact, our chief scientist, Claudia Perlich gave a talk last month at the Predictive Analytics Meetup group entitled “What’s in your wallet. Modeling quantiles for wallet estimation”, and I spoke at the same a Meetup gathering just the other night to a group of over 150 members on the NICE methodologies presented here.

If you get a chance, I encourage you to check out their monthly talks on cutting edge topics in machine learning and predictive analytics.

Here at Media6Degrees, we will continue to improve our methods and to tackle more complicated causal questions in the display advertising ecosystem. We will keep you posted on our progress!

2 Comments

M6D Topics

Ad Serving (3)
Case Study (2)
Data Science (3)
General (18)
Industry (2)
Orbit (1)
Privacy (2)

Subscribe
Twitter Feed
Contact Us

Quick links

contact us
careers
privacy policy
consumer assistance

Is Your Ad Effective? Why This is The Most Important Question for Attribution Measurement.

Time for Brands to Play Moneyball, too

m6d CEO Tom Phillips Talks to Digiday’s Brian Morrissey

Claudia Perlich – m6d Chief Scientist – Wins Prestigious KDD Award

The Science Behind NICE

Recent Posts

M6D Topics

Follow us

Quick links

Partners