Friday, October 30, 2009

On Virtualizing your Storage

Way back, I went over what's virtualization on Unix, and what's not. Well now it's time to hit storage over the head with the debunking hammer!

So, let's start with defining virtualization in the storage context. Storage virtualization is taking a homogenous or heterogenous set of storage resources, and distributing data over multiple arrays or controllers. That's the simplified version. There's a complicated set of requirements to actually qualify as storage virtualization. I'll break it down in a list format.
  • It must support more than one controller and more than one storage subsystem.
  • It must support more than one vendor's storage subsystem(s).
  • It must support more than one model of storage subsystem.
  • It must support at least one protocol of: Fiber Channel, iSCSI, or NAS.
  • It may or may not include it's own disk storage controllers.
  • Data must be capable of being spanned across two or more attached storage subsystems while being presented to hosts as a single LUN.

So we've got a good list of what is required to qualify, in my world, as storage virtualization. So, let's do the list! What IS Storage Virtualization? (In reverse alphabetical order, and not a complete list.)

Actually Virtualization

  • NetApp V-Series
  • LSI StoreAge SVM
  • IBM System Storage SAN Volume Controller
  • Hitachi Data Systems USP-V
  • Hitachi Data Systems USP-VM

Definitely Not Virtualization

  • HP LeftHand P4000 - scale-out is not virtualization!
  • EMC V-Max - does not attach to ANY other vendor or controller.
  • EMC Invista - does not support to ANY vendor except EMC.
  • Coraid ATA-over-Ethernet products - single vendor chassis with storage built in!

Don't Know Enough, So Might Be!

  • Incipient Network Storage Platform
    They hide all their documentation and technical specifications, so I can't tell if it's just a tool for mirroring and copying between different storage subsystems or it's actually virtualizing.

All that said, there's the argument that could be made that if the product hides the storage behind it and presents a single point of management for your storage, then it's virtualization. But, it's not. It's a gateway.
The EMC people will whine about V-Max being defined as "Not Virtualization." TOUGH LUCK. IT ISN'T. The V-Max is a storage subsystem, which spans data across multiple arrays and multiple controllers within itself. The V-Max does NOT support any externally attached arrays from any other vendors. The people who want to whine about Invista? TOUGH LUCK. IT STILL ISN'T. The Invista slid into supported EMC array cabinets and only worked with EMC.
It's NOT virtualization if it only works with one storage vendor. Period. The point of storage virtualization is to enable heterogenous environments. Be it tiering by applications, saving money by using multiple vendors, or increasing performance by using multiple arrays and subsystems.

So why would you virtualize your storage? There's dozens of good reasons. The one I hear the most frequently is the IBM SVC owning SPC benchmarks. They want that level of performance out of their storage, and they think virtualization is a magic wand. It isn't. Virtualization is, like all things, a piece of the puzzle. No more, no less. Can you rock your world with SPC record breaking performance by going to virtualization? Sure, if you pay for it. Just like everything else.
Virtualization is still a front end to storage subsystems. That means that just because the virtualization engine can do 9,500MB/s random, you're still limited by the arrays behind it. The counteracting component is the use of multiple controllers and arrays. One array does 250MB/s random, but with virtualization you can span the LUN across two, which gets you to 500MB/s random in theory. In reality, it'd likely be closer to 400MB/s, but that's still way up from 250MB/s. Need more performance? Add more controllers. It doesn't hold 100% true, and there's a scaling point where it stops helping, but that's the theory.
From the administration side, virtualization is very appealing. This is also how many gateways claim to be virtualization. A key tenet of storage virtualization is that it must provide a single point of management for your storage, regardless of what's behind it. When you virtualize, all your disk to host provisioning is done in the virtualization engine. You no longer need to slice things out on each controller after looking at loads, what space you have, etcetera. It consolidates the vast majority of your storage management into a single pane of glass.
So far, we've established that virtualization can turn multiple low-performance arrays into solid performing LUNs for your hosts, and your administration nightmares can be drastically reduced.

Now I'm going to tell you why both of those are bad too. First, administration nightmares only go down if you're capable of configuring things that way. You have to be ready to let go of individual arrays for individual applications on your storage subsystem. You create a bunch of large, high performance arrays and give them to the storage virtualization engine to manage. It writes blocks across those arrays. Stop managing spindles, start managing performance. Group arrays by performance and by capacity in your storage subsystem, not by application. Group by application in your storage virtualization engine.

Low performance arrays are STILL low performance arrays. If your issue is seek performance, virtualization will not help you. Seek performance is dependent on the arrays behind the virtualization layer, and the virtualization adds seek penalty - anywhere from 40us to 5ms depending. Putting two SATA arrays behind an SVC will not get you FC. It will get you 1.5 times SATA. Virtualization is not a replacement or workaround for baseline array performance. It's a way to enhance performance. And the most common configuration error? People tier their storage by controller. Tier 1 is this storage subsystem, tier 2 is that storage subsystem, and tier 3 is yet another. You won't realize significant performance increases from this configuration. You'll dramatically increase spindle count, but you become hard limited by controller performance, and unbalance controller loading. The optimal configuration is to span tiers across multiple controllers. 3 Controllers with 10 arrays of 5 SAS15k disks, 10 arrays of 5 SAS10k disks and 10 arrays of 5 SATA disks will typically perform better than 1 controller with 30 arrays of each type of disk. The load becomes more balanced across all three controllers, rather than having one controller sitting idle while one is begging for mercy.

So, how do you determine if virtualization is right for you? If you said "I have more than two storage subsystems and I need better performance and ease of management," I want that gold star back right now. First and foremost, storage virtualization is not for everyone, it's not always appropriate, and it's not a cheap solution to an expensive problem. Storage virtualization allows you to consolidate multiple heterogenous resources into a single point of management and allocation. That's it. Performance benefits should not be at the top of your list of reasons. Does that mean it can't be? No, but it shouldn't be the primary reason you're taking your first looks at storage virtualization, or even your second looks. Many environments can realize performance gains for far less money by analyzing and optimizing their storage configuration to suit their environment. And no, I don't mean best practices. Best practices are a starting point - not an end point. Reconfiguration of existing storage might buy a lot more than you expect.

Storage Virtualization fits most easily in rapidly or frequently changing environments, medium to large environments looking for increased scalability or flexibility on existing or new hardware, or environments with large amounts of administrative overhead. I'll get shot for saying it, but I'm going to anyways - in some environments, storage virtualization can change the staffing requirement from 5 to just 2 or even 1 Storage Admin. That doesn't mean it isn't suited to other environments, just that these environments are the most likely to receive immediate benefit from storage virtualization.
Unlike server virtualization and partitioning, storage virtualization is still rather immature. If a vendor starts telling you they solve everything including the kitchen sink, be wary. Every business should look at the costs and benefits of storage virtualization on a case by case basis, with detailed analysis not just from the vendor, but from internal staff as well. Can you use your existing storage subsystems? Can you expand with new subsystems? Will you have to forklift upgrade the storage behind it to expand further? Does it support your preferred vendors? Who's using it for what application with what results?

Storage virtualization has amazing potential in many, many environments but it also has the capability to burn you just as badly.

Come back later, where I openly mock every VTL on the market for being the utter crap they are courtesy of an obscure company with an unpronouncable name, and complacency on the part of manufacturers!

No comments:

Post a Comment