Code as Craft

The Etsy Way

Posted by Chad Dickerson | Filed under philosophy

As you might imagine, we at Etsy get a lot of “can I pick your brain?” requests about how we do things at Etsy, or what we’ll call here The Etsy Way. While we take these requests as huge compliments to the work we do, we have to be somewhat protective of the team’s time. We’re proud of what we’ve been doing and believe in sharing it, so we’ve invested hundreds (if not thousands) of hours into providing public information on this blog and elsewhere. This is the best way to scale our sharing as broadly as possible. (And we’ll still meet with some people — we’ll just ask that you read everything below first since we’ve worked so hard on it!) Consider this post that first friendly conversation over coffee.

The most important component of The Etsy Way is culture and that is as difficult to teach as it is important. To get a sense of how we think about culture, take a look at Optimizing for Developer Happiness, which includes a 24-minute video of a talk I did and a link to the accompanying slides.

Here are a few more links about culture:

Scaling startups
How does Etsy manage development and operations?
Code as Craft: Building a Strong Engineering Culture at Etsy (slides)

With the culture bits explained, below are a few other key posts in the Etsy canon. All of these are inter-related with the culture, of course, and help reinforce it (remember it’s all about culture. Did someone say “culture”?):

Quantum of Deployment (Erik Kastner). We deployed code to production more than 10,000 times in 2011. If you wonder “how did they do that?” this post will tell you all you need to know.

Track every release (Mike Brittain). Here, we write about the methods we use to track the success of every code deploy with application metrics. This is part of the not-so-secret sauce.

Measure Anything, Measure Everything (Ian Malpass). We introduce you to StatsD, the open source software we built at Etsy to enable obsessive tracking of application metrics and just about anything else in your environment. The best part is you can download StatsD yourself and try it out.

Divide and Concur (Noah Sussman and Laura Beth Denker). By reading this post, you’ll learn about all the inner workings of our automated testing setup: what software we use (with plenty of links), how we set it up, and the philosophy behind it all.

We also have tons of slides from talks we have done, all available in the Code as Craft group on Slideshare.

And last but not least, we have an Etsy Github repository with lots of goodies.

Pretty much everything we write about above is open source (even the culture) so the motivated reader will find links to tips and actual software along the way to actually set things up on his/her own. If there’s anything you’d like to know more about The Etsy Way, just let us know in the comments. We’ll add it if we have it, and probably write it if we don’t.

As you can tell, a really important part of The Etsy Way is encouraging people on the team to contribute to open source, write informative and entertaining blog posts, and put together killer presentations. If you want to join the fun, we’re always hiring.

No responses

Upcoming, Etsy Engineering Near You

Posted by Kellan Elliott-McCrea | Filed under engineering, events

A few places to look for us in the next few months.

Michelle D’Netto and Lindsey Baron, February 23, Selenium 101 Workshop. Brooklyn, NY.

Laura Beth Denker, February 24, Scaling Communication via Continuous Deployment. London.

We’re sponsoring Devopsdays Austin, April 2nd and 3rd. Austin, TX. Look for us.

John Goulah, April 11th, Starts with S and Ends With Hard: The Etsy Shard Architecture. Santa Clara, CA

Michelle D’Netto, Stephen Hardisty and Noah Sussman, April 16-18, Handmade Etsy Tests and Selenium In the Enterprise: What Went Right, What Went Wrong (So Far). London.

Laura Beth Denker, May 22nd, Developer Testing 201: When to Mock and When to Integrate and It’s More Than Just Style. Chicago, IL

No responses

Code as Craft Speaker Series: Rasmus Lerdorf, this Thursday

Posted by Kellan Elliott-McCrea | Filed under Uncategorized

A look at the state of PHP in 2012. Where are we, how did we get here and how does PHP fit into the current infrastructure ecosystem of the Web? Plus, a quick tour of what is new and cool in PHP 5.4. Reserve your free ticket, and subscribe to our list to find out about upcoming speakers.

Update: A recording of this talk and the slides are now available online.

2 responses

Turbocharging Solr Index Replication with BitTorrent

Posted by etsydavidgiffin | Filed under data, engineering, infrastructure, operations, search

Many of you probably use BitTorrent to download your favorite ebooks, MP3s, and movies. At Etsy, we use BitTorrent in our production systems for search replication.

Search at Etsy

Search at Etsy has grown significantly over the years. In January of 2009 we started using Solr for search. We used the standard master-slave configuration for our search servers with replication.

All of the changes to the search index are written to the master server. The slaves are read-only copies of master which serve production traffic. The search index is replicated by copying files from the master server to the slave servers. The slave servers poll the master server for updates, and when there are changes to the search index the slave servers will download the changes via HTTP. Our search indexes have grown from 2 GB to over 28 GB over the past 2 years, and copying the index from the master to the slave nodes became a problem.

The Index Replication Issue

To keep all of the searches on our site working fast we optimize our indexes nightly. Index optimization creates a completely new copy of the index. As we added new boxes we started to notice a disturbing trend: Solr’s HTTP replication was taking longer and longer to replicate after our nightly index optimization.

After some benchmarking we determined that Solr’s HTTP replication was only allowing us to transfer between 2 MB and 8 MB of data per second. We tried various tweaks to HTTP replication adjusting compression and chunk size, but nothing helped. This problem was only going to get worse as we scaled search. When deploying a new slave server we experienced similar issues, only 8 MB per second transfer pulling all of our indexes at once and it could take over 4 hours, with our 3 large indexes consuming most of the transfer time.

Our 4 GB optimized listings index was taking over an hour to replicate to 11 search slaves. Even if we made HTTP replication go faster, we were still bound by our server’s network interface card. We tested netcat from master to a slave server and the results were as expected, the network interface was flooded. The problem had to be related to Solr’s HTTP replication.

The fundamental limitation with HTTP replication is that replication time increases linearly with the number of slaves. The master must talk to each slave separately, instead of all at once. If 10 boxes take 4 hours, scaling to 40 boxes would take over half a day!

We started looking around for a better way to gets bits across our network.

Multicast Rsync?

If we need to get the same bits to all of the boxes, why not send the index via multicast to the slaves? It sure would be nice to only send the data once. We found an implementation of rsync which used multicast UDP to transfer the bits. The mrsync tool looked very promising: we could transfer the entire index in our development environment in under 3 minutes. So we thought we would give it a shot in production.

 [15:25]  <gio> patrick: i'm gonna test multi-rsyncing some indexes
          from host1 to host2 and host3 in prod. I'll be watching the
          graphs and what not, but let me know if you see anything
          funky with the network
 [15:26]  <patrick> ok
 ....
 [15:31]  <keyur> is the site down?

Multicast rsync caused an epic failure for our network, killing the entire site for several minutes. The multicast traffic saturated the CPU on our core switches causing all of Etsy to be unreachable.

BitTorrent?

For those folks who have never heard of BitTorrent, it’s a peer-to-peer file sharing protocol used for transferring data across Internet. BitTorrent is a very popular protocol for transferring large files. It’s been estimated that 43% to 70% of all Internet traffic is BitTorrent peer-to-peer sharing.

Our Ops team started experimenting with a BitTorrent package herd, which sits on top of BitTornado. Using herd they transferred our largest search index in 15 minutes. They spent 8 hours tweaking all the variables and making the transfer faster and faster. Using pigz for compression and herd for transfer, they cut the replication time for the biggest index from 60 minutes to just 6 minutes!

Our Ops experiments were great for the one time each day when we need to get the index out to all the slave servers, but it would also require coordination with Solr’s HTTP replication. We would need to stop replication, stop indexing, and run an external process to push the index out to the boxes.

BitTorrent and Solr Together

By integrating BitTorrent protocol into Solr we could replace HTTP replication. BitTorrent supports updating and continuation of downloads, which works well for incremental index updates. When we use BitTorrent for replication, all of the slave servers seed index files allowing us to bring up new slaves (or update stale slaves) very quickly.

Selecting a BitTorrent Library

We looked into various Java implementations of the BitTorrent protocol and unfortunately none of these fit our needs:

The BitTorrent component of Vuze was very hard to extract from their code base
torrent4j was largely incomplete and not usable
Snark is old, and unfortunately unstable
bitext was also unstable, and extremely slow

Eventually we came upon ttorrent which fit most of the requirements that we had for integrating BitTorrent into the Solr stack.

We needed to make a few changes to ttorrent to handle Solr indexes. We added support for multi-file torrents, which allowed us to hash and replicate the index files in place. We also fixed some issues with large file (> 2 GB) support. All of these changes can be found our fork of the ttorrent code; most of these changes have already been merged back to the main project.

How it Works

BitTorrent replication relies on Lucene to give us the names of the files that need to be replicated.

When a commit occurs the steps taken on the master server are as follows:

All index files are hashed, a Torrent file is created and written to disk.
The Torrent is loaded into the BitTorrent tracker on the master Solr server.
Any other Torrents being tracked are stopped to ensure that we only replicate the latest version of the index.
All of the slaves are then notified that a new version of the index is available.
The master server then launches a BitTorrent client locally which seeds the index.

Once a slave server has been notified of a new version of the index, or the slave polls the master server and finds a newer version of the index, the steps taken on the slave servers are as follows:

The slave server requests the latest version number from the master server.
The Torrent file for the latest index is downloaded from master over HTTP.
All of the current index files are hash verified based on the contents of the Torrent file.
The missing parts of the index are downloaded using the BitTorrent protocol.
The slave server then issues a commit to bring the new index online.

When new files need to be downloaded, partial (“.part”) files are created. This allows for us to continue downloading if replication gets interrupted. After downloading is completed the slave server continues to seed the index via BitTorrent. This is great for bringing on new servers, or updating servers that have been offline for a period of time.

HTTP replication doesn’t allow for the transfer of older versions of a given index. This causes issues with some of our static indexes. When we bring up new slaves, Solr creates a blank index whose version is greater than the static index. We either have to optimize the static indexes or force a commit before replication will take place.

With BitTorrent replication all index files are hash verified ensuring slave indexes are consistent with the master index. It also ensures the index version on the slave servers match the master server, fixing the static index issue.

User Interface

The HTTP replication UI is very clunky: you must visit each slave to understand which version of the index it has. Its transfer progress is pretty simple, and towards the end of the transfer is misleading because the index is actually being warmed, but the transfer rate keeps changing. Wouldn’t it be nice to look in one place and understand what’s happening with replication?

With BitTorrent replication the master server keeps a list of slaves in memory. The list of slaves is populated by the slaves polling master for the index version. By keeping this list we can create an overview of replication across all of the slaves. Not to mention the juicy BitTorrent transfer details and a fancy progress bar to keep you occupied while waiting for bits to flow through the network.

The Results

Pictures are worth a few thousand words. Lets look again at the picture from the start of this post, where we had 11 slave servers pull 4 GB of index.

Today we have 23 slave servers pulling 9 GB of indexes.

You can see it no longer takes over an hour to get the index out to the slaves despite more than doubling the number of slaves and the index size. The second largest triangle on the graph represents our incremental indexer playing catch up after the index optimization.

This shows the slaves are helping to share the index as well. The last few red blobs are indexes that haven’t been switch to BitTorrent replication.

Drawbacks

One of the BitTorrent features is hash verification of the bits on disk. This creates a side effect when dealing with large indexes. The master server must hash all of the index files to generate the Torrent file. Once the Torrent file is generated all of the slave servers must compare the hashes to the current set of index files. When hashing 9 GB of index it can take roughly 60 seconds to perform the SHA1 calculations. Java’s SHA1 implementation is not thread safe making it impossible to do this process in parallel. This means there is a 2 minute lag before the BitTorrent transfer begins.

To get around this issue we created a thread safe version of SHA1 and a DigestPool interface to allow for parallel hashing. This allows us to tune the lag time before the transfer begins, at the expense of increased CPU usage. It’s possible to hash the entire 9 GB in 16 seconds when running in parallel, making the lag to transfer around 32 seconds total.

Improvements

To better deal with the transfer lag we are looking at creating a Torrent file per index segment. Lucene indexes are made up of various segments. Each commit creates an index segment. By creating a new Torrent file per segment we can reduce the lag before transfer to milliseconds, because new segments are generally small.

We are also going to be adding support for transfer of arbitrary files via replication. We use external file fields and custom index time stamp files for keeping track of incremental indexing. It makes sense to have Solr manage replication of these files. We will follow HTTP replication’s lead on confFiles, adding dataFiles and indexFiles to handle the rest of the index related files.

Conclusion

Our search infrastructure is mission critical at Etsy. Integrating BitTorrent into Solr allows us to scale search without adding lag, keeping our sellers happy!

34 responses

The Product Hacking Ecosystem

Posted by Andrew S. Morrison | Filed under engineering, people, philosophy

Most product ideas are shitty, yet we spend the majority of our lives working on them.

As a product hacker, you’ll be working on a constant stream of ideas that excite you to the point of obsession; staying up late writing code, thinking about it every waking and non-waking minute. We’ve all admitted that a minority of our ideas will turn into something that will have the impact we dream of, but we don’t let that truth prevent us from being excited that this next thing might be the one. Some have admitted this and accepted that they’re a junky who’s only going to get that fix from a great feature once in a long while. Although I admit that I’m a junky, I haven’t yet become a fatalist.

Web Operations people speak about measuring their work by the Mean Time Between Failures (MTBF). For product hackers, we should be thinking in terms of minimizing Mean Time Between Wins (MTBW). Because it’s difficult to know which ideas are going to blossom into that great feature, a nice proxy for MTBW is Mean Time to Bad Idea Detection (MTTBID).

By building out an ecosystem for you and your team that allows bad ideas to be detected quickly, you can spend your time iterating on the great ideas and shipping your wins quickly while the shitty ideas die a meaningless death somewhere in a pile of other shitty ideas.

The best hackers I know are impatient. As soon as you get an exciting result, you’re going to be talking about it with whoever will listen. An ecosystem of tools that are just there providing a source of truth that everyone can understand and agree with is like having a posse of hardened thugs at your back at all times. Instead of excitement going sour when people who haven’t seen the light are doubting you, you can all agree on whats actually going on. If the numbers you care about are getting better, then great. If your product isn’t something that can be measured easily, or is a long term bet, you can show that the numbers you care about aren’t getting worse and show that its safe to push on into the wilderness.

Here are some things we’ve learned about how to build that ecosystem.

Make Tools for Failing Fast

Ideas can fail at any level of scrutiny. Some ideas don’t pan out when looked at under a microscope. Others don’t work out when talking about it over a drink. If it survives to the point of being shown to users, it can fail when you’re looking at it through a telescope and you’re just not seeing the response you hoped for. We spent some time trying to improve the quality and performance of our relevance sorting algorithm for search results before we made relevance-ordering the site-wide default. During the four month period where we did this work, we were able to get thirty experiments completed. Of those, eleven were real wins that made it into the final product.

At Etsy, the birth of every idea is the simplest possible implementation that permits experimentation. To give ourselves immediate feedback on the effects of search algorithm changes we created a tool that let us see the new ranking and all of the information we need to understand why a listing is ranked the way it is. The tool let us see this new ranking the moment our search server finished compiling, allowing for rapid iteration on tricky edge-cases, and the ability to quickly detect and kill bad components.

We created a tool that runs a sample of popular and long-tail queries through a new algorithm and displays as much information as can be determined without real people being involved; an estimated percent of changed search results over the universe of all queries, a list of the most strongly affected queries, a list of the most strongly affected Etsy shops, etc..

We created tooling for running side-by-side studies where real users were asked to rate which set of search results they preferred for a given query. When a feature was ready to be launched as an A-B test, we were able to see a set of visualizations explaining how our change was performing relative to the standard algorithm.


What a Search AB Test Looks Like	What a site-wide AB test looks like

The best part is that we don’t think about these tools while building new products and running experiments. We come up with ideas, implement them, and if they do well we ship them. Our conversations are about the product, the code we write is for the product and our shitty ideas are executed on the spot and sloppily buried in shallow graves, as they deserve and as is our wont.

Make Tools that Make Process Disappear

Edward Tufte introduced the concept of “chart junk”; the distracting stuff on a visualization that isn’t saying anything about the data. Marshall McLuhan made a compelling case that “The medium is the message” implying that the vehicle through which you perceive something impacts your understanding of it. Just because your paying clients won’t see your internal tooling doesn’t give you license to slap together an ill considered tool. The medium is the message, and your tools are your medium. Working Memory is limited and people are busy. Decisions are worse when getting the answer to a question about your product requires that you lose track of what you asked or why it’s important. Decisions are even worse if you never get a chance to ask questions and get answers. Products designed with fewer poor decisions are less shitty than products designed with more poor decisions. GNU wouldn’t exist without GDB.


Our Non-Shitty Search Query Analysis Tool	Solr’s Shitty Query Analysis Tool

It’s really important to our business that we return great results when people are doing searches on Etsy. It turns out we’re super lazy and if there are any barriers in the way of us asking “why is this item showing up for this query”, we’re just not going to ask the question and it’s not going to get fixed. Our query analysis tool (pictured on the left) helps reduce that barrier to getting an answer.

The best information about your product is going to come from real users. Unfortunately, its often painful to get your products out in to the real world. Having completed an iteration of a product, you’re filled with excitement and fear. You’re hoping you got it all right, but if you didn’t, you’re ready to fix it because you know every intimate detail of your new creation. This state of excitement and readiness is the last thing you want to let go of. Continuous deployment, the practice of pushing your code live the moment its ready, is absolutely essential for product hackers.

If you need to wait any non-trivial amount of time between completing something and seeing how well it’s performing, you’re not going to be working on that project by the time you get your answer. When you do get your answer, you’re not only going to have to refresh your memory on what you had been working on, but you’re going to have to do the same on whatever else you had started working on. Asking your team to work with patience and discipline has never worked and never will work. Build an ecosystem where doing the right thing is the easiest thing. Build an ecosystem where making great decisions is the easiest thing. Build an ecosystem where the lazy, excitable and impatient really shine.

9 responses

Translation Memory

Posted by dalonsoa | Filed under data, engineering, internationalization

By: Diego Alonso

As we mentioned in Teaching Etsy to Speak a Second Language, developers need to tag English content so it can be extracted and then translated. Since we are a company with a continuous deployment development process, we do this on a daily basis and as an result get a significant number of new messages to be translated along with changes or deletions of existing ones that have already been translated. Therefore we needed some kind of recollection system to easily reuse or follow the style of existing translations.

A translation memory is an organized collection of text extracted from a source language with one or more matching translations. A translation memory system stores this data and makes it easily accessible to human translators in order to assist with their tasks. There’s a variety of translation memory systems and related standards in the language industry. Yet, the nature of our extracted messages (containing relevant PHP, Smarty, and JavaScript placeholders) and our desire to maintain a translation style curated by a human language manager made us develop an in-house solution.

In short, we needed a system to suggest translations for the extracted messages. Etsy’s Search Team has integrated Lucene/Solr into our deployment infrastructure allowing for Solr configuration, Java-based indexers, and query parsing logic to go to production code in minutes. We decided to take advantage of Lucene’s MoreLikeThis functionality to index “similar” documents, in this case similar English messages with existing translations. The process turned out to be pretty straightforward: we query the requested English message using a ContentStream to the MoreLikeThisHandler and get as a result similar messages with scores. This is done through our Translator’s UI via Thrift. We’ve determined a threshold to filter the messages by score in order to only provide relevant translations after getting similar English messages from the query results.

It’s worth mentioning that we need to use a ContentStream to send the source message because most of the time we’ll be requesting translation suggestions for new messages. In other words, messages without translations are not present in our index to match as documents. When sending a ContentSream to the MoreLikeThisHandler, it will extract the “interesting” terms to perform the similarity search.

Here’s a simple diagram of the main parts of this process:

We could easily test and optimize our results on the search environment through Solr queries before wiring the service in the Translator’s UI. As you can see in the following query we send the content (stream.body) of the English message, play with the minimum document frequency (mindf) and term frequency (mintf) of the terms and even filter the query (fq) for translations in a certain language.

localhost:8393/solr/translationmemory/mlt?stream.body=Join%20Now &mlt.fl=content&mlt.mindf=1&mlt.mintf=1&mlt.interestingTerms=list &fl=id,md5,content,type,score&fq=language:de

And since we know you love to read some code, here’s how we defined our translation memory data types and service interface in Thrift:

struct TranslationMemoryResult {
 1: string md5
 2: double score
}

struct TranslationMemorySearchResults {
 1: i32 count,
 2: list matchedMessages
}

service TranslationMemorySearch extends fb303.FacebookService {
 /**
  * Search for translation memory
  *
  * @param content of the message to match
  * @param language code of the existing translations
  * @return a TranslationMemorySearchResults instance - never "null"
  */
 TranslationMemorySearchResults search(1: string content,
                                       2: i32 type,
                                       3: string language)
}

Let’s look at some common use cases where translation memory comes in handy.

It’s pretty common that a new feature is released where we want to attract new members by adding some kind o