The art of scaling
Posted by Mikko at 1 February 2012
Category: BiliBid, Programming
When we started BiliBid, it ran on an Apache web server. Our server’s system load played from 0.50 to 2.00 depending on the number of users online. On the first few days, everything ran well but eventually it came to a point when it was hogging too much RAM.
Just to give you an idea, I’ll enumerate the HTTP requests the BiliBid server is constantly receiving:
- The pages. They are the requests for the front page, winners page, single bid page, redeem PIN page, etc. It is requested once per pageview.
- Balance requests. When a user is logged in, the user’s browser requests for the bid balance once every two seconds.
- Timer requests. When a user is in the front page or a single bid page, the user’s browser requests for the timers of the items in the page once every second.
- Bid requests. Everytime a user places a bid, a request is sent to the server.
Multiply that to the number of users online!
Switch from Apache to nginx + php-fpm
One day around early September, I tried researching on nginx (engine-x) since I’ve heard that it has a smaller and more predictable memory (RAM) footprint. I also read stuff on php-fpm. I tried to weigh things since I’ll have to lose some Apache functionality if I switch to this one. This means no more .htaccess for me, but obviously I can convert my .htaccess file to something that nginx would understand. Fortunately, I found out that it wasn’t that hard to do it.
The same day, I installed nginx on the server. Before killing Apache and starting nginx, I fixed all my configuration files first. Then came the moment of truth, I killed Apache and started nginx. I was surprised that everything went so well that I couldn’t believe that BiliBid was already running in nginx.
Then came late October, people started reporting having high latencies with BiliBid. I think it was also the same period when I decided to put the Connection Quality indicator (before I did it, I know no other penny auction site with that feature – until I received a tip from a bidder that a penny auction site in the Philippines [starts with B and ends with A] implemented it, too). Before the problem occurred, I made no major change on BiliBid therefore I concluded that there might have been routing problems from some of our bidders’ end to our server.
This time, I decided to move our server to a nearer location to reduce latency and hopefully solve the routing problem. The move had trade-offs. It was more cost-efficient but to the expense of number of usable processor threads. On our previous server, I had 16 processor threads compared to four (4) on the new server.
I decided to review most of the SQL queries I was making and optimize it. I was able to reduce the execution time of my complex queries, some went down to up to 50%.
Everything went well until we started our Facebook campaigns. We received double the amount of concurrent online users. Our server load went crazy to up to 20.00, then one time it went as high as 70.00 during the auction of a big item. We had to postpone one of the big items to distribute the load over time. I tried experimenting different configurations on nginx, php-fpm and MySQL but to no avail.
Going back to our old server
This time, I decided to go back to our old server (16 usable processor threads). The problem with the server load disappeared. Everything was okay until recently, we received an overwhelming number of reports that translate to latency problems. I checked them one by one, around 1/2 were valid while the other third I consider invalid – just riding with the flow (and I hate it).
The most recent server switch, adding another server, MariaDB, etc
So I decided to go back to nearer server with four (4) processor threads. Before switching, I tried reading stuff to optimize the server to prevent high loads. I decided to change SQL server from MySQL to MariaDB. I think I first heard of MariaDB during a DevCon in UP. I experimented on the different storage engines offered by MariaDB until I settled with Aria.
Then came the switch. The latency was significantly lowered down (reduced by more than 50% on most users). The only problem was the higher-than-usual server load (as compared to our old server with 16 processor threads) which I fear would slow down the SQL queries (which is bad for a realtime app like a penny auction site) on peak hours.
Fearful of the slowdown, I decided to get another server on the same location. I configured the second server as the SQL server while the other one will be used solely as a web server. Since they are on the same location, I set up a private network for the two machines so that they exist in one logical network and one LAN.
With the new set-up (dedicated machine for SQL and another one for web serving), I was able to reduce the server load on the web server. The SQL server was also running very fine, in fact based on my tests the execution time for my most complex query was reduced to as much as 40%.
Other stuff I did
Since BiliBid is a time-sensitive web application, there is no room for slowdown. I had to:
- Optimize my cron job that runs every minute. I had to tweak it to only update information that needs to be updated, ignore others. Previously, it processes all my entities. Since the number of entities grow over time (new signups daily, new auctions, etc), I found it inefficient.
- Remove all system cron jobs. On a fresh install of most Linux distros, it automatically adds cron jobs for maintenance tasks. I had to remove them since a maintenance task could possibly slow down BiliBid. This equates to disaster if it runs on a peak hour or if there’s an active auction. I decided to manually run this maintenance tasks during off-peak hours.
- Made RRDTool graphs on the latency to make monitoring easier.
I really hope the stuff I did recently will work well. Until now, I continue to experiment different configurations to check the most efficient one.
- Trying to solve an upcoming problem
- Alternative PHP Cache (APC) saves the day
- Three weeks of BiliBid
- How to Speed Up Your WordPress With PHP Speedy
- The Geek Brewery is ONLINE plus DreamHost and A Small Orange Promo Code!