Top of Page

Pure Storage Corporate Blog

spacer
11.01.2012

Happy Pure Halloween!

It was a great Halloween at Pure Storage this year…we closed another quarter marked with terrific growth and increasingly large customer deployments, and we took a little time to celebrate Halloween together Pure-style…with an impromptu office costume contest!

Do you recognize any of the Pure family below?

spacer

Choosing winners for the costume contest was hard, but the unanimous winner was Marcus the “PureBallerina,” impressing everyone with his beauty and grace:

spacer

Runners-up were Ben the “PureFlasher” and yours truly, the “PureSamurai.”  Also a special honorable mention for Burr, who dressed up as our co-founder Coz, can you even tell who is who?

spacer

Happy Halloween to all, and our hearts in particular go out to our customers, partners, and friends digging-out and putting their businesses and personal lives back in order after Hurricane Sandy.

Posted by Matt Kixmoeller
comments (1)
spacer
10.19.2012

Oracle on Flash: The Case of the 4K Redo Log Block Size

Introduction: Redo Block Size Myths

Recently I presented a webinar about Oracle on flash, and demonstrated that many of the traditional storage considerations and compromises facing DBA’s and System Adminstrators are irrelevant on a Pure Storage FlashArray.  In particular, you no longer need to worry about RAID levels, stripe sizes, block sizes and so forth.  Nor do you need to make any fundamental changes to your current database configuration when you migrate to a Pure Storage array.

In this post we’ll examine the impact of redo log block size on performance in our array.  You may have come across blogs recommending a 4K redo log block size for redo logs on flash.  This option is new in Oracle 11gR2 and is designed to take advantage of Advanced Format drives which use a 4K sector size instead of the standard 512 byte sector size.  The cited advantage of the 4K redo block size is that it minimizes block misalignment problems, and hence improves performance.  There is no question that redo log block size can have a significant impact on performance on certain types of SSD’s.  Guy Harrison, for example, observed a redo write time improvement of over 3x using 4K redo logs.  Note that the 4K block size significantly increases redo wastage (redo blocks written to disk before they are full), but usually this is not a big performance concern.

How to Change Redo Block Size

To create redo logs with a non-default block size (512 bytes on most linux platforms), you must specify the “blocksize” setting in the when you create the logfile group.  Your choices are 512, 1024, and 4096.  For example:

13:28:59 system@nduasm.oracle1 SQL> alter database add logfile group 5 blocksize 4096
 13:29:09 2 /
Database altered.

If you see an error such as:

alter database add logfile group 5 size 2g blocksize 4096
 *
 ERROR at line 1:
 ORA-01378: The logical block size (4096) of file +ORARECO is not compatible with the disk sector size (media sector
 size is 512 and host sector size is 512)

you need to set the _disk_sector_size_override parameter to TRUE:

13:17:21 system@nduasm.oracle1 SQL> alter system set "_disk_sector_size_override"=TRUE scope=both;
System altered.

Pure Storage Lab Testing

For our load test,  we ran the hammerora TPC-C workload. We used 20 logfile groups sized at 2 gigabytes, first using 512 byte block size, and then 4k block size.  The redo logs rolled roughly once a minute for all tests (i.e. we generated about 2 gigabytes of redo per minute).  We ran the comparisons in a 3 environments:

  1. ASM in a virtual machine
  2. EXT4 file system on a physical machine
  3. ASM on a physical machine

Although the performance chacteristics varied from one test bed to another, there was very little difference within any single environment.  The VMware differences were typical — test durations differ by about 2%, and redo wastage is nearly 10x greater with the larger block size:

Test Duration and redo wastage in VMware environment:

spacer

At a macro level (Top Activity in Enterprise Manager), the load profiles are nearly identical as well:

Enterprise Manager Top Activity Graph: 512 Byte Redo Logs:

spacer

 

Enterprise Manager Top Activity Graph: 4K Block Redo Logs:

spacer

 

The transaction rate was also essentially identical in both configurations:

512 byte redo block size

spacer

 

4K redo block size:

spacer

 

 

In Test 2, we mounted the EXT4 file system “noatime, discard”.  The discard flag is specific to SSD devices and thinly provisioned LUNs; it makes the file system issue trim commands to the block device when blocks are freed.  As with the VMware environment, we see virtually no difference in performance with the different redo log block sizes:

Test Duration and redo wastage on physical machine with EXT4 file system:

spacer

 

Perhaps most significantly, redo write time for these tests was also virtually identical.  This chart illustrates the metric for a test on a physical machine running ASM:

spacer

 

Conclusion

What Gives?

It’s true that some ssd product’s do indeed benefit from a 4K redo log block size.  That is because they are architected with a fixed size RAID geometry.  The notion of a sector  (traditionally a pie slice of spinning disk) really has no meaning or context in a Pure Storage FlashArray; why would it?

spacer

You can think our Purity Operating Enviroment as utilizing a variable sector size, with the smallest being 512 bytes.  Thus we have neither block misalignment issues nor performance compromises.  The Purity Operating Environment can certainly understand and process i/o requests that are presented in the context of sectors, but by the time that data makes it to the flash, sectors have been abstracted away.  Before we actually write data to the array, we write it to NVRAM where we perform deduplication and compression.  The actual bits that are written to the underlying array represent the original i/o, but do not necessarily resemble the original i/o.  In addition, since we also perform inline RAID, these bits are not necessarily written to a single physical SSD.

Obviously the Pure Storage array has nothing in common with spinning disks.  But it might not be so obvious that it has little in common with other flash arrays out there either.  We strive to make your life simple by leveraging flash’s unique capabilities as opposed to hindering it by mimicking the idiosyncrasies of disks.  While there are Oracle and o/s settings that take take advantage of flash, your existing configuration will work as-is.  Besides redo log block size, things like database block size, ASM vs. file system, LUN  ”spindle” count have no bearing on the performance of a Pure Storage array, which means you can deploy your database on Pure Storage without modficiations, and you can continue to adhere to whatever operational policies you may have in place.

Posted by Chas. Dye
comments (2)
spacer
10.11.2012

Introducing Charles Dye – Pure Storage Database Solutions Architect

I’m excited to announce a new addition to the Pure Storage team – Charles “Chas” Dye joined us a few months ago as our Database Solutions Architect.  Chas has literally decades of experience as a bonafied DBA, optimizing Oracle physical and logical deployments from both engineering and operations perspective at places like Silver Spring Networks, Yahoo!, Opsware / Loudcloud, and eXcite, focusing on both OLTP and OLAP workloads.  Chas has authored several books on Oracle deployment and optimization, and is also both an aspiring amateur photographer and hawaiian shirt enthusiast, as evidenced in the photo below.

spacer

Why does Pure Storage need a database guru?  Pure Storage focuses on three key use cases: databases, virtual servers, and virtual desktops, and often the first two intersect as people are increasingly virtualizing their database infrastructure.    On the database side of things, adoption of Pure Storage typically follows a three-stage process:

  • Step 1: Swap spinning disk for 100% flash and see what happens.  Usually this will result in a clear and immediate benefit for applications which are spindle-bound (just this week I spoke with a customer who took a call center analytics application from 24-hour job processing to 2-hour processing – no optimization required).  But in many cases adding flash just makes evident OS and application bottlenecks further “up stream” from storage that prevent realizing the full potential of flash.
  • Step 2: Optimize OS and database tuning for flash.  There are a myriad of adjustments that one can make to the IO layers of the OS, FS, and database to tweak how block storage is accessed…and much of the “best practices” that DBAs have learned over decades of disk are simply wrong for flash.  Quick tweaks at these layers can make a large impact on end performance.
  • Step 3: Optimize application / query logic.  Finally, when your applications can expect consistent <1ms latency, it allows you to design your applications differently.  Queries are possible that weren’t possible before, and often simplification can lead to improved performance.

Chas’ job here are Pure Storage is to act as a partner to our customers in this journey – to help them understand how to walk down the paths of step 2 and 3 to optimize their applications for flash, and to generalize these learnings and publish them to the community in the form of best practices, whitepapers, benchmarks, and reference architectures.

You’ll see frequent posts on this blog from Chas as he begins publishing his findings, and you can have your first chance at interacting with him this Friday on his inaugural webcast – “Optimizing Oracle for Flash” – register here.

Posted by Matt Kixmoeller
comments (0)
spacer
10.09.2012

The Risk of Over Promising and Under Delivering with Hybrid Storage Arrays

As Pure embarks for Europe this week (VMworld Barcelona and Structure Amsterdam, hope to see you there!), an analogy for the inherent risk in hybrid storage occurred to me.

First, imagine that you’re expecting to travel internationally by ship, but then find that you’ve been upgraded to a flight, and will get there in a fraction of the time. You’re likely ecstatic (modulo the crowding in coach), contemplating extra work and fun or an earlier return home thanks to the savings in transit time. Hybrid storage (that intermixes flash memory and mechanical disk within an appliance) is similar: When your applications are designed for disk latencies and instead get a 10+X acceleration with flash, your users are thrilled. So incorporating a flash cache to help a disk-centric array go faster is classic under promise and over deliver.

But now imagine if you showed up to board your international flight, and they put you on a ship instead? As soon as you start expecting the higher performance of air travel, then your view of the situation reverses. Relative to airplanes, ships actually have excellent throughput (items transfered per unit time—think IOPS and bandwidth). They just suck on latency. And latency matters to your business: Just as you cannot plan a business trip without knowing whether you were going by air or by sea, your applications cannot be designed to take advantage of solid-state performance unless they know they are going to get it. When your users are expecting flash latencies but instead find themselves waiting on disk, you’ve over promised and under delivered.

Afraid I shouldn’t get much credit for the above analogy. It was inspired by Peter Burns of Google, who contributed the following characterization in his blog that helps put compute performance into perspective (text borrowed with permission, original post here):

Let’s talk time scales real quick. Your computer’s CPU lives by the nanosecond: most CPUs can get a few things done in each nanosecond – mostly simple math and comparisons. To make this easier to grasp, suppose you’re the CPU and instead of nanoseconds, you live and work second by second. For clarity I’ll keep this metaphor to a single-core of a single processor.spacer

You can hold a few things in your head (register). Not more than a dozen or two in your active memory, but you can recall any of them pretty much instantly. Information that’s important to you you’ll often keep close by, either on sheets of loose-leaf paper on your working desk (L1 cache) a couple seconds away, or in a one of a handfull of books in your place (L2 and up cache) which is so well organized that no individual piece of information is more than a dozen or so seconds away.

If you can’t find what you’re looking for there, you’ll have to make a quick stop at the library down the street (RAM, i.e. main memory). Fortunately, it’s close enough that you can go down and grab a book and get back to work in only ~8 and a half minutes, and it’s enormous, some are thousands of times the size of a typical strip-mall book store. A little inconvenient, until you remember that this library has a free delivery service, so it’s really no bother at all so long as you can still find things to work on while you wait.

But the local library mostly just stocks things on demand (which is fair, your bookcases, worksheets, and even the dozen or two facts you hold in your head are mostly the same way). The problem is that when you need something that’s not there, it can take a while to get it. How long? Think Amazon.com in the age of exploration. They send out an old wooden boat and it could be a week, could be a month, and it’s not unusual to wait 3 years before you hear a response.

Welcome to the world of hard disk storage, where your information is retrieved by making plates of metal spin really fast. Many metric tons of sweat have been spent making this as fast as possible, but it’s hard to keep up with electrons flowing through wires.

So when someone says that Solid State Disks are awesome, it’s because they’re able to turn that slow, unpredictable old sailing ship into a streamlined steam-powered vessel. A good SSD can often make the voyage in less than a week, sometimes in little more than a day. It can also make many thousands more quests for information per year.

Love it. So the problem with hybrid arrays is now obvious: It’s simply very hard to design your business applications not to care whether each underlying operation is submillisecond by flash or 10s of milliseconds by disk. When your customer or employee is waiting for the result in real-time, being able to offer flash performance 50%, 75%, or even 95% of the time doesn’t let you raise the bar, since the application and user have to expect the ship even if they may get the airplane. In order to raise the bar without over promising and under delivering, you need to provision enough flash memory that you can service virtually all I/Os from it even as your workloads evolve over time. Hence the appeal of flash-centric, rather than disk-centric storage. Disparite latencies were one of the things that doomed Hierarchical Storage Management (HSM) and Virtual Tape Libraries (VTLs), which were once a $1B+ business. Sound familiar?

From the perspective of a CPU doing the random I/O demanded by virtualization and many database operations, disk today is slower than tape was 15 years ago. This is the reason that leading consumer websites like Google and Facebook have been systematically eliminating mechanical disk from the latency path of their performance intensive applications. With vendors like Pure delivering flash in a form factor that’s plug-compatible with traditional disk arrays and that is generally more cost effective, flash really is poised to be the new disk. Why should the large consumer sites have all of the fun with flash, when the benefits are just as material to your business?

 

 

Posted by Scott Dietzen
comments (2)
spacer
10.02.2012

More Bang for Your Storage Buck

We just interviewed a candidate from a fellow upstart storage company. This gentlemen took a logo’ed coffee mug on every sales call. He told the customer to just display the mug prominently when meeting with their incumbent storage vendors to get a substantially better deal. With Pure facing a competitive brouhaha with EMC over all flash storage arrays (our thoughts on the XtremIO acquisition here), we quite liked this idea.

But a c

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.