spacer spacer spacer spacer spacer spacer What is database sharding?
Home > Database Sharding

Database Sharding


Evaluate ScaleBase Now



Database Sharding is a popular concept used for scaling databases. Basically, it means that instead of storing data in one database, you distribute the application data across multiple databases. Why would you want to do that? Simply put, Database Sharding is the #1 way to scale your database. It’s done by many SaaS companies, the most popular web sites out there, and even big enterprises. Since implementing Sharding is such a pain, we at ScaleBase have developed a transparent database sharding solution. We call it a Database Load Balancer – and it can let your MySQL scale like never before.

What Is Sharding?

Wikipedia defines Sharding  as:

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than splitting by columns (as for normalization). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

But I think the best explanation uses an example, so let’s take the following table:

spacer

This is a small table containing a list of customers. Every database out there can handle such a table. But what will happen if instead of 7 rows the table has to store 7 million rows? Now, theoretically, this should be supported. But usually there will be lots of operations on such a large table – so we have many read and write operations on this table every second. Now, “At Scale, Everything breaks” (as Google VP Engineering, Urs Hölzle, says here). So our nice customer table will probably become a bottleneck. Why? Because it doesn’t fit in the database server cache anymore, because of database isolation management, and for other reasons, all of which cause the database to crawl under load.

Welcome to Sharding. If we take the customers table, and split it to 4 different databases, each database will contain 1.75 million rows. That’s still a lot – but less than 7 million rows. This will result in improved database performance (ScaleBase tests have shown about 75% response time improvements in some standard performance tests. You can see the results here).

The following diagram shows how such a table can be split:

spacer

Every database will get some of the rows, and it becomes the developer’s responsibility to know which row is located in which table.

So, what can you get out of sharding your database?

  1. Response time improvement. If your database is big (over 50GB) or has many hits/second (anything above a few hundred hits/second), sharding can probably boost your database performance.
  2. Scaling. If you run a big database, you probably keep your fingers crossed, as failure is imminent. Databases break at scale. With Sharding, that’s no longer an issue. Just make sure you have enough shards and everything will work.

Sharding Complexities

Sharding is great. However, writing sharding code is difficult. It requires you to rewrite most of your Data Access Layer from scratch. And while it’s difficult to do when you write your own SQL code, it’s even more complex when using O/R mapping tools, as most are not “sharding oriented”.

But even after writing the initial sharding code, you might run into issues. For instance, a common problem occurs when scaling requires adding more shards. Usually, internally written sharding code supports a fixed number of shards, and adding shards requires massive code rewrites – as well as the major downtime required when moving data from one shard to another.

Other parts of the infrastructure also change when using a sharded database. For example, the reporting application must now be aware of the sharding logic, since you want to collect data from multiple databases rather than just one. And if the reporting application is an off-the-shelf product, you’re out of luck. You’ll have to write the reporting application from scratch.

Backup is an issue. Database Administration is an issue. And more complexities just continue to pop up.

So Database Sharding is a great solution for Database scaling, but it’s complex and costly. Unfortunately, most of the costs are hidden and only come up after the initial sharding is performed.

ScaleBase Transparent Sharding

Luckily, there’s ScaleBase. Our unique ScaleBase software solution handles all of the Database Sharding heavy lifting for you, without changing a single line of your code. And since it’s not embedded inside your application, your BI, DBA team and backup tools can use it, too – no infrastructure change.

All you have to do to scale with Database Sharding is download the ScaleBase solution. Install it, and off you go. Our ScaleBase Analyzer will automatically suggest the best sharding policy for you, and our management console makes sharding configuration a snap.

And even more – ScaleBase also provides the ability to scale your database using read/write splitting, meaning that copies of the database can now serve for read operations, while only one database is used for the writes. Who said that database replicas are useless machines only used for high availability? Now they can be used for scaling as well.

Your architecture will look something like this (Please note that several deployment options exist. Check here for more information):

spacer

In this architecture you get database high availability, read/write splitting, and sharding – in a fully redundant environment that also boosts your performance.

What’s to think about?


Evaluate ScaleBase Now
  • Solution
    • Overview
    • Supported Databases
      • MySQL
      • MySQL on EC2
      • Amazon RDS
    • Use Cases
      • Massive Reads and Multi Region Availability
      • Write Operation Scalability
    • Download
    • ScaleBase Analyzer
  • Resources
    • Blogs
    • FAQ
    • Presentations
    • Webinars
    • Benchmark
    • White Papers
    • Documentation
  • Company
    • Management Team
    • Investors
    • Partners
    • Careers
    • Contact Us
  • Testimonials
  • Support
  • Blog
    • Architecture
    • Performance
    • ScaleBase
  • News
    • Company News
    • Upcoming Events

Load time improved by PHP Speedy spacer

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.