-
Home > About This Post
This was posted by casey on Monday, March 10th, 2008 at 12:51 am. Bookmark the permalink.
Subscribe to the RSS feed for all comments on this post.
-
Filed Under
- Ruby on Rails
Load Balancing
NOTE – Winter 2008: We are now running Phusion Passenger / Apache instead of Thin. Everything else remains the same.
oops
I’ll get the embarrassing part out of the way first. So… Ravelry had over an hour’s worth of hiccups and short periods of downtime yesterday. The cause of the problem was just a stupid full disk – I missed rotating a log, I was too lazy to set up Nagios monitoring for all my disks, and I didn’t notice the steadily climbing graph. However, I changed a whole bunch of big stuff just before the the problem hit and ended up wasting time looking in all of the wrong places before this caught my eye:
I just wanted you to know that the new stuff that I’m about to gush about wasn’t the cause.
6 million Rails requests per day
Ravelry does just over 2 million page views each day. Once you add in all of the AJAX hits, RSS feeds, API calls, and a few other things it adds up to 6 million requests that actually hit the Rails app. (I just grepped one day’s master syslog for the “Completed in” lines)
We currently have 70 Thin instances that run the Rails application (previously using Mongrel – keep reading). Each can handle a single request at a time and I guess you can call them “app servers” for lack of a better term. Requests come in to the web server, things that aren’t static files like images are passed along to Rails, the response goes back to the client.
Our biggest challenges at this layer have been 1) making sure that the perceived speed of the site isn’t affected if one or some of the instances go crazy or go down and 2) deploying new versions of the site without interruption.
HTTP server to Rails – the progression
We are now on our 3rd real “version” of our setup when it comes to the front end web server and its connection to the Rails instances on the backend. There is a lot of people thinking (and coding) about better ways to run and deploy Rails apps and I’m sure that we’ll continue to change how we do things.
Version 0: Apache + FastCGI
Used this combo in an early development version. Kind of a crappy setup, but it was all I had since I was still on a shared host. If you’ve tried playing with Ruby on a shared host with this setup and you hate it, don’t worry. There is another way.
Version 1: Apache → mod_proxy_balancer → Mongrel
This is probably the simplest setup because most people already have a standard Apache installation. Start up your Mongrels (however many you want), configure your Apache, and Apache proxies the requests to your Mongrels using a simple balancing algorithm.
Our biggest problems? Both hot deploys and recovering problem Mongrels was made difficult by mod_proxy_balancer’s behavior. Restarting part or all of the cluster often resulted in Apache returning 500 errors anytime the affected servers were hit, even once they were back and up running properly. I often had to reload or restart Apache (which was a slow process when Apache was getting hit with lots of traffic) to bring things back to life.
Version 2 – nginx → Mongrel
I originally set up an Apache alternative because we were moving (finally) from a single server to our 3 machine, multiple virtual machine setup. I wanted to try nginx and see what all of the fuss was about. nginx uses less CPU than Apache, *far* less memory, and is simpler to configure. nginx is great – I highly recommend it. The biggest benefits that we got from this setup were more free memory, lighter load, and no more 500 errors.
I used Evented Mongrel for part of this time, due to the promise of extra stability under load.
Version 3: nginx + upstream_fair → Mongrel
We moved to the fair proxy balancer for nginx shortly after it came out – it was a great improvement. The fair module solves a major problem with connecting Mongrels to nginx the usual way. Because a Rails app running on Mongrel can only handle one request at a time, nginx’s round robin sharing of the load would often lead to requests waiting in line for a busy Mongrel when there are other free instances that could be used. Switching to this setup greatly improved the perceived speed because users were much less likely to be affected by a slow running requests initiated by others. If it weren’t for this module, I would have had to move some API calls, some searches, and administrative functions to background tasks or alternate clusters. It came out at just the right time
If you are starting something small and you aren’t dead set on using Apache, I recommend this configuration as a great and simple setup. nginx is excellent, upstream_fair makes the proxying smarter.
We ran this way for the last 4 months. I finally made some changes yesterday because my feature wish list had grown too large. There were 2 major things that I was hoping to correct. First, requests were still loaded on to the single-connection handling Rails app without regard to their current state. 2000 impatient concurrent users could easily turn a little slowness into a big pile-up by clicking madly and hitting refresh (as people tend to do if a site is unresponsive). This made it really hard to do releases without putting up a maintenance page, and since we were often doing daily releases, I *really* wanted to avoid taking the site down for maintenance all the time. Second, things would sometimes go funny with the balancing. I’d notice that an entire server’s worth of Mongrels was being ignored (by looking at my graphs) and I’d have to reload nginx to wake it up. I’m pretty sure that this only happened when I added and removed servers from the cluster during deploys but I’m not positive. In any case, the behavior was a little disconcerting.
Version 4: nginx → HAProxy (running now) → Thin
Disclaimer – I’ve only been running this for 2 days and I’ll explain Thin after – the fact that I replaced Mongrel with Thin isn’t that important to my story.
HAProxy is a software load balancer. I love it. Instead of making nginx proxy requests to our set of 70 Mongrels (okay, Thins) it sends everything to HAProxy which does a much nicer job of balancing the load. HAProxy can handle HTTP and TCP traffic with some caveats – if you have *any* load balancing needs, I highly recommend that you take a peek at it.
Why I like HAProxy:
- Understands per-server connection limits and configurable request queuing
- Watches servers for up/down-ness (or single slow running requests, in the case of Rails/Mongrel) and routes requests appropriately
- “abortonclose” tosses out useless aborted requests. From the HAProxy manual: “In presence of very high loads, the servers will take some time to respond… When clients will wait for more than a few seconds, they will often hit the “STOP” button on their browser, leaving a useless request in the queue, and slowing down other users, and the servers as well, because the request will eventually be served, then aborted at the first error encountered while delivering the response.”
- Can lay off the back end and provide users with some sort of feedback (even if it is a 503 Not Available error) if things are going badly.
- Great logging. It is so nice to be able to see (and analyze) how the balancing/proxying is going.
- Fast and light! Low memory footprint, low CPU impact. nginx doesn’t put it to shame, which is saying something.
- Cool and useful statistics page that shows up/down servers, session counts, request queue, server status and downtime, etc. Plenty of good stuff for my Nagios monitoring and Munin graphs.
- This one isn’t scientific. I can do my hot deploys. By its very nature, HAProxy deals really well with a rolling restart of all servers.
Like I said, it has only been 2 days, but I am loving this setup so far. It is *so nice* to have lots of configurability and lots of visibility when it comes to the connection between my HTTP server and the application servers.
The HAProxy stats page:
nginx, HAProxy, and Mongrel/Thin
I didn’t see many (any) examples on the web, so I just want to share a few configuration snippets…
nginx config
upstream haproxy { server 127.0.0.1:8000; } server { # blah blah do the same thing that you would with the usual Mongrel setup }
haproxy config This should get you started – check out the haproxy manual for all you ever wanted to know. Make sure to look into the things that I bolded.
global log 127.0.0.1 local0 warning daemon # and uid, gid, daemon, maxconn... defaults mode http retries 3 option abortonclose option redispatch maxconn 4096 timeout connect 5000 timeout client 50000 timeout server 50000 frontend rails *:8000 default_backend mongrels backend mongrels option httpchk balance roundrobin server server-1 127.0.0.1:8001 maxconn 1 check server server-2 127.0.0.1:8002 maxconn 1 check
Thin
Earlier, I mentioned that I switched from Mongrel to Thin. Thin is basically a drop-in replacement for Mongrel – you don’t have to do anything special (other than “gem install”) to run your rails app under Thin.
Why? The reasons why I like Thin are pretty simple. Development is very active, it is nicely packaged and includes examples, rake tasks, and scripts for controlling the service, and Rack makes things more flexible. I’m happy with the change, but I don’t think that people need to switch their Rails apps from Mongrel (yet).
Tomorrow is Monday
Since we are still adding 800ish people to the site a day and Monday is our busiest day, we break a new traffic record every Monday. We’ll see how things go on day 3 of my new setup.
I wondered what happened Saturday when I started getting the 500 errors – I figured you were probably tearing your hair out
Nagios rocks so hard – we deployed it at a former workplace and it was fabulous until the higher-ups decided we couldn’t have any “unsupported” open source apps, and brought in NetIQ and MOM. gag
Just a friendly poke about maybe trying Ebb. The first “real release” was about a month ago and it’s already gone from 0.0.1 to 0.0.4 (Ry is conservative with version numbers I’m sure he’d appreciate (more) feedback.
Er, I should also mention the Ebb now has a page with graphs, currently showing 1.5-2x throughput increases over Thin. ebb.rubyforge.org/
Nice post, thanks for the info!
Hi Casey,
I’m web developer by day and a knitter by night (and lunchtime), and I love reading about how Ravelry is put together. I’m just starting to develop with RoR, so I find this whole thing very educational. Thanks for letting us in on this stuff!
Best, Christina
Thanks for the detailed posting.
I have been using nginx talking directly to the mongrels – and while it was working great, I didn’t have much visibility into what was happening.
I can see what is happening to requests and have improved my site with the info (adding a few more mongrels – since there were times when all mongrels were busy, but the box wasn’t 100% full)
I’ve not moved to thin yet – mainly because I’ve not got the perfect monit+thin setup – getting closer but I still get errors when I try. Someday!
ps. my haproxy status are public – userscripts.org/haproxy
thanks again
Casey — glad you’re one step (ok, perhaps several steps) ahead of me. Having these posts from a trusted source really makes choosing less risky.
BTW — RimuHosting seems really solid; it’s such a rarity today to get support from people who actually seem to know what they are doing, and care. With great relish, I dropped my old host last week and am running my blog and my new site there. Who knows, maybe virtual hosting is just the ticket for DigitalAdvisor, too?
Nice post! How is Version 4 working for you after a month of production use? Do you still prefer HAProxy over Fair Proxy Balancer? Still prefer Thin over Mongrel?
How are you finding thin for reliability and stability compared with mongrel?
Interesting, thanks. Why do you put nginx before haproxy in ver. 4 configuration?
Hey thanks for sharing this I’m currently looking to improve my vps setup (nginx + mongrels with no load balancing) and HAProxy looks like it would fit.
good write up.