Making it Virtually Easy to Deploy on Day One

Posted by John Goulah | Filed under engineering, infrastructure

At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production. We’ve talked a lot in the past about our deployment, metrics, and testing processes. But how does the development environment facilitate someone coming in on day one and contributing something that takes them through the steps of committing code, running it through our tests, and deploying it with deployinator?

A new engineer’s first task is to snap a photo using our in house photo booth (handmade of course) and upload it to the about page. Everyone gets a shiny new virtual machine with a working development version of the site, along with their LDAP credentials, github write access, and laptop. We use an internal cloud system for the VM’s, mostly because it was the most fun thing to build, but also gives us the advantage of our fast internal network and dedicated hardware. The goal is a consistent environment that mirrors production as closely as possible. So what is the simplest way to build something like this in house?

We went with a KVM/QEMU based solution which allows for native virtualization. As an example of how you may go about building an internal cloud, here’s a little bit about our hardware setup. The hypervisor runs on HP DL380 G7 servers that provide us with a total 72G RAM and 24 cores per machine. We provision 11 guests per server, which allows each VM 2 CPU cores, 5G RAM, and a 40G hard drive. Libvirt supports live migrations across non-shared storage (in QEMU 0.12.2+) with zero downtime which makes it easy to allocate and balance VM’s across hosts if adjustments need to be made throughout the pool.

We create CentOS based VM’s from a disk template that is maintained via Openstack Glance, which is a tool that provides services for discovering, registering, and retrieving virtual images. The most recent version of the disk images are kept in sync via glance, and exist locally on each server for use in the creation of a new VM. This is faster than trying to pull the image over the network on creation or building it from scratch using Kickstart like we do in production. The image itself may have been kickstarted to match our production baseline, and we template a few key files such as the network and hosts information which is substituted on creation, but in the end the template is just a disk image file that we copy and reuse.

The VM creation process involves pushing a button on an internal web page that executes a series of steps. Similar to our one button deployment system, this allows us to iterate on the underlying system without disruption to the overall process. The web form only requires a username which must be valid in LDAP so that the user can later login. From there the process is logged such that it that provides realtime feedback to the browser via websockets. The first thing that happens is we find a valid IP in the subnet range, and we use nsupdate to add the DNS information about the VM. We then make a copy of the disk template which serves as the new VM image and use virt-install to provision the new machine. Knife bootstrap is then kicked off which does the rest of the VM initialization using chef. Chef is responsible for getting the machine in a working state, configuring it so that it is running the same version of libraries and services as the other VM’s, and getting a checkout of the running website.

Chef is a really important part of managing all of the systems at Etsy, and we use chef environments to maintain similar cookbooks between development and production. It is extremely important that development does not drift from production in its configuration. It also makes it much easier to roll out new module dependencies or software version updates. The environment automatically stays in sync with the code and is a prime way to avoid strange bugs when moving changes from development to production. It allows for a good balance between us keeping things centralized, controlled, and in a known-state in addition to giving the developers flexibility over what they need to do.

At this point the virtual machine is functional, and the website on it can be loaded using the DNS hostname we just created. Our various tools can immediately be run from the new VM, such as the try server, which is a cluster of around 60 LXC based instances that spawn tests in parallel on your upcoming patch. Given this ability to modify and test the code easily, the only thing left is to overcome any fear of deployment by hopping in line and releasing those changes to the world. Engineers can be productive from day one due to our ability to quickly create a consistent environment to write code in.

Share this:

  • Twitter

Category: engineering, infrastructure
Tags: chef, deployment, first day, KVM, libvirt, QEMU, virtualization

26 responses to Making it Virtually Easy to Deploy on Day One

  • Hackernytt | Om startups och allt som hör till. På svenska. | At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production says:
    March 13, 2012 at 4:44 pm

    [...] At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production [codeascraft.etsy.com] 0 poäng | Postat mars 13 av Erik Starck [...]

    Reply
  • spacer jespada says:
    March 13, 2012 at 9:47 pm

    Awesome post, Thanks for shearing spacer .Btw.. how do you guys provisioning the lxc also with openstack, inside the VM’s?

    Reply
    • spacer John Goulah says:
      March 13, 2012 at 9:50 pm

      we’ll have another post on our LXC setup, we don’t currently use openstack for that

      Reply
  • spacer jespada says:
    March 13, 2012 at 9:57 pm

    Cool, Looking forward, I wrote a basic ohai plugin for lxc, pushed upstream.. still need some spec/unit tests

    Reply
  • spacer pwelch says:
    March 20, 2012 at 10:43 am

    Do you store the KVM guest on the local hard drive of each hypervisor host? If so do you notice any performance issues? Currently have a similar setup (Same server hardware, KVM and Chef) but using multipath-IO and having issues. Considering putting the guest images locally on the KVM host.

    Reply
    • spacer John Goulah says:
      March 20, 2012 at 10:47 am

      yes we store them locally and no we don’t have any performance issues

      Reply
  • spacer Daniel Williams says:
    March 20, 2012 at 2:30 pm

    The posts are excellent and insightful! Thank you.

    Most of the content seems to be directed toward the client-facing tier (web/php). Do you have services/daemons that run in the middle tier used by multiple front-ends?

    How do you handle database changes (sprocs, data, structure) that may contain breaking changes? Are the unit test for the DB changes?

    Reply
    • spacer John Goulah says:
      March 20, 2012 at 4:24 pm

      Our architecture is mostly non-service oriented, with the exception of search. For the most part we run a monolithic php app and thats how this is setup for us. But the services can be handled in much the same way.

      Our database changes are made in such a way that code works before and after the changes (no deletion of fields or field name changes). We roll these out behind config flags and ramp them up. We do write dbunit tests or tests that mock the ORM to test these types of changes.

      Reply
  • Weekly Link Roundup 26th March – 1st April | Martin's Place says:
    April 1, 2012 at 10:33 am

    [...] have a blog post describing how their infrastructure can scale and pump out new VMs so that newly hired employees can [...]

    Reply
  • spacer Tarko says:
    April 2, 2012 at 11:30 am

    What about developers getting their modified code into VMs? Supposedly they are not working with vim on the server so there must be some better, easy to use, way that doesn’t involve millions of commits.

    Reply
    • spacer John Goulah says:
      April 2, 2012 at 12:16 pm

      A lot of people do use vim or emacs on the server, but using a graphical IDE locally is also supported. We have a simple script that people can use to rsync code over or they can mount the filesystem locally. We try to support any method people are most comfortable with, and generally this isn’t a problem.

      Reply
  • spacer ericbblake says:
    April 10, 2012 at 5:13 pm

    I noticed your mention of libvirt as a foundation for your VM setups. Would you like to add an entry to libvirt.org/apps.html that best describes how you use libvirt?

    Reply
    • spacer John Goulah says:
      April 10, 2012 at 6:18 pm

      I’d be happy to add to this, however it looks like all of these are actual open sourced apps and we haven’t released any of our stuff (yet). Let me know if I’m missing something!

      Reply
  • spacer ericbblake says:
    April 10, 2012 at 5:55 pm

    I was informed of your usage of libvirt from this email:
    https://www.redhat.com/archives/libvir-list/2012-April/msg00317.html

    Are you interested in being listed on libvirt.org/apps.html as a client of libvirt? If so, could you please provide a summary to be included there?

    Reply
  • spacer Nick says:
    May 1, 2012 at 8:19 pm

    If a developer has mounted the filesystem locally (over SMB?) how does that work with git? We’ve just migrated to using git (from cvs) but we’re having major problems with git on windows accessing dev servers over SMB.

    Reply
    • spacer John Goulah says:
      May 1, 2012 at 8:24 pm

      I can’t really say, nobody at Etsy is using windows, but people have had success with sshfs

      Reply
  • spacer msuriar says:
    May 12, 2012 at 1:32 pm

    Very interesting post; thanks for sharing.

    Out of interest, what OS are you running on your hypervisor, and how is the networking set up? Are you bridging the guests on to the hypervisor’s physical NIC, or are you using NAT?

    The reason I ask is that I’ve been attempting to set up multiple guests on a hypervisor running on Debian, and am running in to issues relating to multicast and IPv6. Specifically, I’d like my guests to have static IPv6 addresses configured. Unfortunately, it appears that bridged interfaces on Linux don’t implement multicast correctly; multicast frames transmitted from a bridge are forwarded to all members of the bridge, including the orignator. This causes IPv6 DAD to fail, so the statically configured IPv6 addresses are not used by the guests. This happens if I use VirtualBox or Qemu/KVM as the hypervisor. Running the same VirtualBox guests on a Windows 7 hypervisor results in correct behaviour.

    Most frustrating. There is a Qemu bug filed: https://bugs.launchpad.net/qemu/+bug/761469

    But since this happens with both VirtualBox and Qemu/KVM, I suspect this is a lower level problem, perhaps with bridgeutils.

    Reply
    • spacer John Goulah says:
      May 12, 2012 at 6:38 pm

      we use bridged, redhat based OS, IPv4 only so far

      Reply
  • Recommended Reading: Etsy, Code as Craft | Joseph Scott says:
    July 16, 2012 at 11:36 am

    [...] Making it Virtually Easy to Deploy on Day One: At Etsy we have one hard and fast rule for new Engineers on their first day: deploy to production. [...]

    Reply
  • spacer Patrik says:
    August 7, 2012 at 4:57 am

    Hey John,
    great Article – thanks for sharing!
    I was wondering how do you handle the database side of things? Do developer work on the same database or does each developer has it’s own smaller dataset running in his vm?

    Thanks again for sharing
    Patrik

    Reply
    • spacer John Goulah says:
      August 7, 2012 at 10:52 am

      We use a shared database.

      Reply
  • Human Resource News » Blog Archive » Startups: Stop Trying To Hire Ninja-Rockstar Engineers » Human Resource News says:
    August 13, 2012 at 9:07 am

    [...] ability to acclimate and impact change in your organization. Companies like Etsy actually have a hard-and-fast rule that all engineers should deploy to production on day [...]

    Reply
  • spacer The Engineer Exchange Program « Code as Craft says:
    September 10, 2012 at 2:01 pm

    [...] This week, one of Etsy’s Staff Engineers is traveling to San Francisco to spend a week at Twitter, observing and helping out, learning what Twitter does particularly well, and seeing differences that may reinforce or refute beliefs we’ve held as core. Likewise, a Twitter Platform Engineer is traveling to Brooklyn for the week, and watching what Etsy does well and poorly, all while helping out (and, of course, deploying on her first day). [...]

    Reply
  • GitHub: How do companies like Github, 37signals, Quora or other popular startups release code? - Quora says:
    October 24, 2012 at 6:31 pm

    [...] also keen on getting developers to deploy on day one, which they do with some clever virtualisation codeascraft.etsy.com/2012…Embed QuoteComment Loading… • Share • Embed • 2m ago    Ryan Detzel, VP [...]

    Reply
  • ctricks.tweakcoders.in» Blog Archive » Startups: Stop Trying To Hire Ninja-Rockstar Engineers says:
    December 1, 2012 at 11:46 pm

    [...] ability to acclimate and impact change in your organization. Companies like Etsy actually have a hard-and-fast rule that all engineers should deploy to production on day one. 2 – Assign Mentors- Lots of [...]

    Reply
  • pc.tweakcoders.in» Blog Archive » Startups: Stop Trying To Hire Ninja-Rockstar Engineers says:
    December 20, 2012 at 2:28 am

    [...] ability to acclimate and impact change in your organization. Companies like Etsy actually have a hard-and-fast rule that all engineers should deploy to production on day one. 2 – Assign Mentors- Lots of [...]

    Reply
  • Leave a Response

    Click here to cancel reply.

    gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.