Skip to content Skip past content

Load Balancing

NOTE – Winter 2008: We are now running Phusion Passenger / Apache instead of Thin. Everything else remains the same.

oops

I’ll get the embarrassing part out of the way first. So… Ravelry had over an hour’s worth of hiccups and short periods of downtime yesterday. The cause of the problem was just a stupid full disk – I missed rotating a log, I was too lazy to set up Nagios monitoring for all my disks, and I didn’t notice the steadily climbing graph. However, I changed a whole bunch of big stuff just before the the problem hit and ended up wasting time looking in all of the wrong places before this caught my eye:

spacer

I just wanted you to know that the new stuff that I’m about to gush about wasn’t the cause.

6 million Rails requests per day

Ravelry does just over 2 million page views each day. Once you add in all of the AJAX hits, RSS feeds, API calls, and a few other things it adds up to 6 million requests that actually hit the Rails app. (I just grepped one day’s master syslog for the “Completed in” lines)

We currently have 70 Thin instances that run the Rails application (previously using Mongrel – keep reading). Each can handle a single request at a time and I guess you can call them “app servers” for lack of a better term. Requests come in to the web server, things that aren’t static files like images are passed along to Rails, the response goes back to the client.

Our biggest challenges at this layer have been 1) making sure that the perceived speed of the site isn’t affected if one or some of the instances go crazy or go down and 2) deploying new versions of the site without interruption.

HTTP server to Rails – the progression

We are now on our 3rd real “version” of our setup when it comes to the front end web server and its connection to the Rails instances on the backend. There is a lot of people thinking (and coding) about better ways to run and deploy Rails apps and I’m sure that we’ll continue to change how we do things.

Version 0: Apache + FastCGI

Used this combo in an early development version. Kind of a crappy setup, but it was all I had since I was still on a shared host. If you’ve tried playing with Ruby on a shared host with this setup and you hate it, don’t worry. There is another way.

Version 1: Apache → mod_proxy_balancer → Mongrel

This is probably the simplest setup because most people already have a standard Apache installation. Start up your Mongrels (however many you want), configure your Apache, and Apache proxies the requests to your Mongrels using a simple balancing algorithm.

Our biggest problems? Both hot deploys and recovering problem Mongrels was made difficult by mod_proxy_balancer’s behavior. Restarting part or all of the cluster often resulted in Apache returning 500 errors anytime the affected servers were hit, even once they were back and up running properly. I often had to reload or restart Apache (which was a slow process when Apache was getting hit with lots of traffic) to bring things back to life.

Version 2 – nginx → Mongrel

I originally set up an Apache alternative because we were moving (finally) from a single server to our 3 machine, multiple virtual machine setup. I wanted to try nginx and see what all of the fuss was about. nginx uses less CPU than Apache, *far* less memory, and is simpler to configure. nginx is great – I highly recommend it. The biggest benefits that we got from this setup were more free memory, lighter load, and no more 500 errors.

I used Evented Mongrel for part of this time, due to the promise of extra stability under load.

Version 3: nginx + upstream_fair → Mongrel

We moved to the fair proxy balancer for nginx shortly after it came out – it was a great improvement. The fair module solves a major problem with connecting Mongrels to nginx the usual way. Because a Rails app running on Mongrel can only handle one request at a time, nginx’s round robin sharing of the load would often lead to requests waiting in line for a busy Mongrel when there are other free instances that could be used. Switching to this setup greatly improved the perceived speed because users were much less likely to be affected by a slow running requests initiated by others. If it weren’t for this module, I would have had to move some API calls, some searches, and administrative functions to background tasks or alternate clusters. It came out at just the right time spacer

If you are starting something small and you aren’t dead set on using Apache, I recommend this configuration as a great and simple setup. nginx is excellent, upstream_fair makes the proxying smarter.

We ran this way for the last 4 months. I finally made some changes yesterday because my feature wish list had grown too large. There were 2 major things that I was hoping to correct. First, requests were still loaded on to the single-connection handling Rails app without regard to their current state. 2000 impatient concurrent users could easily turn a little slowness into a big pile-up by clicking madly and hitting refresh (as people tend to do if a site is unresponsive). This made it really hard to do releases without putting up a maintenance page, and since we were often doing daily releases, I *really* wanted to avoid taking the site down for maintenance all the time. Second, things would sometimes go funny with the balancing. I’d notice that an entire server’s worth of Mongrels was being ignored (by looking at my graphs) and I’d have to reload nginx to wake it up. I’m pretty sure that this only happened when I added and removed servers from the cluster during deploys but I’m not positive. In any case, the behavior was a little disconcerting.

Version 4: nginx → HAProxy (running now) → Thin

Disclaimer – I’ve only been running this for 2 days spacer and I’ll explain Thin after – the fact that I replaced Mongrel with Thin isn’t that important to my story.

HAProxy is a software load balancer. I love it. Instead of making nginx proxy requests to our set of 70 Mongrels (okay, Thins) it sends everything to HAProxy which does a much nicer job of balancing the load. HAProxy can handle HTTP and TCP traffic with some caveats – if you have *any* load balancing needs, I highly recommend that you take a peek at it.

Why I like HAProxy:

  • Understands per-server connection limits and configurable request queuing
  • Watches servers for up/down-ness (or single slow running requests, in the case of Rails/Mongrel) and routes requests appropriately
  • “abortonclose” tosses out useless aborted requests. From the HAProxy manual: “In presence of very high loads, the servers will take some time to respond… When clients will wait for more than a few seconds, they will often hit the “STOP” button on their browser, leaving a useless request in the queue, and slowing down other users, and the servers as well, because the request will eventually be served, then aborted at the first error encountered while delivering the response.”
  • Can lay off the back end and provide users with some sort of feedback (even if it is a 503 Not Available error) if things are going badly.
  • Great logging. It is so nice to be able to see (and analyze) how the balancing/proxying is going.
  • Fast and light! Low memory footprint, low CPU impact. nginx doesn’t put it to shame, which is saying something.
  • Cool and useful statistics page that shows up/down servers, session counts, request queue, server status and downtime, etc. Plenty of good stuff for my Nagios monitoring and Munin graphs.
  • This one isn’t scientific. I can do my hot deploys. By its very nature, HAProxy deals really well with a rolling restart of all servers.

Like I said, it has only been 2 days, but I am loving this setup so far. It is *so nice* to have lots of configurability and lots of visibility when it comes to the connection between my HTTP server and the application servers.

The HAProxy stats page:

spacer

nginx, HAProxy, and Mongrel/Thin

I didn’t see many (any) examples on the web, so I just want to share a few configuration snippets…

nginx config

  upstream haproxy {
    server 127.0.0.1:8000;
  }
  
  server {
    # blah blah do the same thing that you would with the usual Mongrel setup
  }

haproxy config This should get you started – check out the haproxy manual for all you ever wanted to know. Make sure to look into the things that I bolded.

  global
    log 127.0.0.1 local0 warning
    daemon
    # and uid, gid, daemon, maxconn...
    
  defaults
    mode            http
    retries         3
    option          abortonclose
    option          redispatch

    maxconn         4096

    timeout connect 5000
    timeout client  50000
    timeout server  50000

  frontend rails *:8000
    default_backend mongrels

  backend mongrels
    option httpchk
    balance roundrobin
    server server-1 127.0.0.1:8001 maxconn 1 check
    server server-2 127.0.0.1:8002 maxconn 1 check

Thin

Earlier, I mentioned that I switched from Mongrel to Thin. Thin is basically a drop-in replacement for Mongrel – you don’t have to do anything special (other than “gem install”) to run your rails app under Thin.

Why? The reasons why I like Thin are pretty simple. Development is very active, it is nicely packaged and includes examples, rake tasks, and scripts for controlling the service, and Rack makes things more flexible. I’m happy with the change, but I don’t think that people need to switch their Rails apps from Mongrel (yet).

Tomorrow is Monday

Since we are still adding 800ish people to the site a day and Monday is our busiest day, we break a new traffic record every Monday. We’ll see how things go on day 3 of my new setup.

Post a comment | Trackback URI

Comments (12)

  1. Datagoddess wrote:

    I wondered what happened Saturday when I started getting the 500 errors – I figured you were probably tearing your hair out spacer

    Nagios rocks so hard – we deployed it at a former workplace and it was fabulous until the higher-ups decided we couldn’t have any “unsupported” open source apps, and brought in NetIQ and MOM. gag

    Monday, March 10, 2008 at 2:29 am #
  2. Nikolas Coukouma wrote:

    Just a friendly poke about maybe trying Ebb. The first “real release” was about a month ago and it’s already gone from 0.0.1 to 0.0.4 (Ry is conservative with version numbers spacer I’m sure he’d appreciate (more) feedback.

    Monday, March 10, 2008 at 2:44 am #
  3. Nikolas Coukouma wrote:

    Er, I should also mention the Ebb now has a page with graphs, currently showing 1.5-2x throughput increases over Thin. ebb.rubyforge.org/

    Monday, March 10, 2008 at 2:45 am #
  4. Christopher wrote:

    Nice post, thanks for the info!

    Monday, March 10, 2008 at 11:25 am #
  5. Christina wrote:

    Hi Casey,

    I’m web developer by day and a knitter by night (and lunchtime), and I love reading about how Ravelry is put together. I’m just starting to develop with RoR, so I find this whole thing very educational. Thanks for letting us in on this stuff!

    Best, Christina

    Thursday, March 20, 2008 at 9:10 am #
  6. Jesse Andrews wrote:

    Thanks for the detailed posting.

    I have been using nginx talking directly to the mongrels – and while it was working great, I didn’t have much visibility into what was happening.

    I can see what is happening to requests and have improved my site with the info (adding a few more mongrels – since there were times when all mongrels were busy, but the box wasn’t 100% full)

    I’ve not moved to thin yet – mainly because I’ve not got the perfect monit+thin setup – getting closer but I still get errors when I try. Someday!

    Saturday, March 22, 2008 at 8:54 pm #
  7. Jesse Andrews wrote:

    ps. my haproxy status are public – userscripts.org/haproxy

    thanks again

    Saturday, March 22, 2008 at 8:55 pm #
  8. Tom H wrote:

    Casey — glad you’re one step (ok, perhaps several steps) ahead of me. Having these posts from a trusted source really makes choosing less risky.

    BTW — RimuHosting seems really solid; it’s such a rarity today to get support from people who actually seem to know what they are doing, and care. With great relish, I dropped my old host last week and am running my blog and my new site there. Who knows, maybe virtual hosting is just the ticket for DigitalAdvisor, too?

    Sunday, March 23, 2008 at 9:35 pm #
  9. Robbie wrote:

    Nice post! How is Version 4 working for you after a month of production use? Do you still prefer HAProxy over Fair Proxy Balancer? Still prefer Thin over Mongrel?

    Tuesday, April 15, 2008 at 12:02 pm #
  10. asim wrote:

    How are you finding thin for reliability and stability compared with mongrel?

    Sunday, June 8, 2008 at 9:53 am #
  11. Spes wrote:

    Interesting, thanks. Why do you put nginx before haproxy in ver. 4 configuration?

    Wednesday, June 18, 2008 at 3:19 am #
  12. Joseph Hsu wrote:

    Hey thanks for sharing this I’m currently looking to improve my vps setup (nginx + mongrels with no load balancing) and HAProxy looks like it would fit.

    good write up.

    Friday, October 17, 2008 at 1:02 pm #

Trackbacks/Pingbacks (3)

  1. links for 2008-03-19 « Bloggitation on Tuesday, March 18, 2008 at 7:20 pm

    [...] Load Balancing < Code Monkey Island (tags: ruby rails nginx thin sysadmin cluster 247up) [...]

  2. FuzzLinks.com » Load Balancing < Code Monkey Island on Monday, July 14, 2008 at 1:38 pm

    [...] codemonkey.ravelry.com/2008/03/10/load-balancing/ [...]

  3. load balancing - StartTags.com on Thursday, January 28, 2010 at 12:46 am

    [...] is F5 software to do load balace and If i want to implement GAD to be redundancy how can i do? …Load Balancing — Code Monkey IslandHAProxy can handle HTTP and TCP traffic with some caveats – if you have any load balancing needs, [...]

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.