Friday, October 30, 2009

Storage - What IS non-disruptive?

This grows out of a discussion going on over here:
http://wikibon.org/blog/migrating-data-within-federated-storage/

The two main systems up for discussion here, are the IBM SVC 2145 and the Hitachi Data Systems USP-V class. So, let's start with some background on the systems.
The IBM SVC 2145 is like most IBM products, an amalgamam of acronyms. The official name is the IBM System Storage SAN Volume Controller. The SVC is a Storage Virtualization Engine, presenting hosts with a unified front-end, SAN Admins with a single point of management, and utilizing any number of disk controllers behind it. It is also the current (and probably forever) record holder of just about every SPC benchmark and a large number of TPC benchmarks. The currently available (and not the announced) hardware offers from 8 to 32 4Gbit FC ports per cluster, as many disks as you can fit into 4,096 MDisks as of 4.2.1 - using RAID5 in 4+P that's 16,384 spindles! Maximum actual spanning is limited to 512 spindles for a single VDisk, which is still jaw-dropping.
The Hitachi USP-V is a beast. A lovely, lovely beast I want for my very own at home (except for the power bills.) It doesn't hold a lot of records because not enough people show it love, in my opinion. The latest generation offers up to 1,152 drives in 1-4 expansion frames with up to 128 flash drives, 512GB of cache, and up to 224 Fiber Channel ports. Everyone else, eat your hearts out. The USP-V also includes all of Hitachi's fancy software offerings, and the ability to assume control of and pseudo-virtualize external storage arrays like IBM DS4000's, HDS AMS-series, and so on.

If you just went "wait, these sound like very different products" that's because they are. You see, the IBM SVC is a Storage Virtualization Engine and the HDS USP-V has a Storage Virtualization Engine. But the SVC offers no storage disks of it's own whereas on the HDS USP-V being disk storage is the core functionality. In other words, we're comparing something with no disks to something that's built entirely around disks. Vendors are getting very good at blurring these lines very badly, or explaining their products very vaguely, resulting in some pretty bad customer confusion.

So which is which and what is what? Let's start with the IBM SVC. The IBM SVC is two to eight 1U systems in an appliance form, which you stick in front of supported disk controllers to provide extent-based virtualization (or not) and/or provide you with a single point of management and presentation for a variety of hosts. Connections to hosts and storage are via standard Fiber Channel, and cluster interconnect uses standard Gigabit Ethernet. The SVC also offers the advantage of using IBM's SDD multipath driver which is available for every major OS out there in driver or pluggable software module form.
This brings us to the HDS USP-V. The HDS USP-V is a high end enterprise storage system boasting some of the most impressive specs you can find. It's highly configurable and customizable. It offers gobs of high speed SAS disks as well as support for SATA disks using HDS' RAID1+, RAID5 and RAID6 algorithms. Like the IBM DS8000's, disks go in 4 at a time at the minimum. The internals are connected via Hitachi's proprietary (not in a bad way) Universal Star Network. External storage is attached via Fiber Channel.
(As a note; I'm not including iSCSI because it's only an announced feature on the SVC, and NAS is the realm of the USP, not the USP-V. Plus we're not comparing features, dangit!)

Now if you read the original post on Wikibon that started us down this road, you'll notice that it was about what constitutes non-disruptive block level virtualization. (Or extents, if you want. Pick your poison.) Some folks have said that only the USP-V does it. But, that's not true. You see, the SVC also does everything the USP-V does. So what's going on here? Well, there's two problems.
One, the USP-V is storage with virtualization while the SVC is just virtualization. Two, the definition of non-disruptive is.. ambiguous at best, tenuous at other times, and just crazy at still others.
Let's start with my definition of non-disruptive. My definition of non-disruptive is being able to perform hardware repairs, software upgrades and hardware upgrades without impacting the production environment beyond performance. Most folks will tell you that my definition is pretty darn reasonable.
HDS and IBM like to redefine "non-disruptive" on a per-product basis, to suit their needs. It's marketing, don't pretend to be surprised, okay? This is what they are paid to do.
So, if we go by my definition, why do both of these systems offer you the potential of non-disruptive maintenance and upgrades? Because they both do. And the SVC may even slightly edge out the USP-V in this regard because of it's appliance nature. (I'll explain, I promise!)

The caveat that both systems are subject to is the actual storage subsystem and attached hosts. Yes. The disks and the end consumers of LUNs. The USP-V can legitimately make that claim because without adding external storage, it's still a USP, offering non-disruptive firmware upgrades and hardware replacement within practical limits.
The SVC can also legitimately claim to be non-disruptive block-level virtualization too. Why? Because the SVC itself can do everything HDS claims to be do non-disruptively as well. Data migration between arrays, between nodes (IO Groups, in fact,) and even between clusters. All that it requires is that the storage subsystems behind it be able to do firmware and hardware maintenance non-disruptively.

This is also where their claims fall apart. For our example, we'll be using the venerable LSI designed-and-built IBM DS4800 storage controller and EXP810 shelves. For the record, I hate the DS4k/DS5k because it is absolute crap for enterprise storage. But it's cheap so it's everywhere. It's still crap.
Given the DS4800 as an External Storage Array on the USP-V and the SVC, both solutions fail the non-disruptive claims. Even the ones their own manufacturers make. Why? Because it's a DS4800, and they're both dependent on it. Firmware upgrades on the DS4800 are fraught with terror and most decidedly disruptive, requiring all IO be stopped to the DS4k. That means any arrays with data on the DS4k must have IO stopped at the virtualizing layer before maintenance can be performed. Which means stopping them at the USP-V or the SVC. Which means shutting down production environments using those LUNs. Was that a "whoops" I just heard?
But by the same token, if we take a USP-V with an AMS2500 behind it, and an SVC with an AMS2500 behind it, both pass because maintenance on the AMS2500 is non-disruptive. See how it works? Now you can dislike marketing crap just as much as I do!
However, within the isolated products themselves - that being the USP-V and only the SVC (with no storage,) all tasks are non-disruptive. As of 4.3.x you can take an SVC from 2145-4F2 hardware to 2145-8G4 hardware in the middle of the day with no impact to your production environment beyond performance, with proper planning.
So why does the SVC slightly edge out the USP-V in non-disruptive? Because the SVC is an appliance you put in a standard rack. If you put each IO Group (two nodes) in separate racks, which you should, the SVC can continue to operate normally through physical moves - excepting when managed storage is shut down for moves. The USP-V can't, because the controller is in a single dedicated frame. Yep. That's the entirety of it.

What about host side? HDS claims to be unique in their ability to migrate a disk non-disruptively between USP-V frames. Well, one, the SVC can do this with Metro mirroring (not to be confused with Global mirroring! Global is the one for long distances!) or within a cluster using VDisk mirroring or FlashCopy (which HDS has equivalents for for in-frame, predictably!) Two, apply brakes when the reality of hosts hits.
No, Virginia, there isn't any such thing as a free lunch and migration between frames or between clusters will always, always be disruptive at the host level. You're changing WWNNs and WWPNs on the controllers, even if the rest isn't changing. And you think a host will just smile and eat it? Boy, don't I wish - that would save me SO much trouble, both past and present! No, no. The hosts will get very, very upset with you. So what's the procedure?
Well, I can't speak to the USP-V's since I haven't done it. But I imagine it's somewhat like the SVC's with obvious key differences. The USP-V is doing a migration from Frame A to Frame B. The SVC is mirroring data between Cluster A and Cluster B. To complete the migration on SVC? You unmount the disks at the host. Reverse the mirror direction. Rediscover disks on the host. Mount the disks from Cluster B. Verify everything is happy, and break the mirror. 5-15 minutes of downtime, typically. My guess is that the USP-V is similar in needing to stop IO and unmount, start migration, rediscover disks to pick up the new frame's ownership, and remount while migration is in progress. In theory, this could be fixed in software, but it's a very difficult problem to fix.

So, what've we learned today? One, don't take the vendor's definition of non-disruptive at face value. Ever. Two, do your own homework and don't just settle for "magic quadrants" and glossies. Insist on tech demos that don't simply consist of the vendor demonstrating the feature on a cherry picked array. Insist on hands on time. Insist on talking to real customers.
This post is a great example of exactly why you should. I learned about things the USP-V can do inter-frame that I didn't know before. And hopefully people learned about things the SVC can do that IBM marketing didn't tell them. (Seriously, IBM marketing sucks.)

And I can hope everybody learned a bit about the practical limitations of any storage virtualization solution, be it HDS, IBM, EMC or Joe's Computer Shack.

Aaaaaand the disclaimer!
I don't work for IBM, Hitachi, any subsidiaries, VARs, or BFFs. Despite my lust for the combination of SVC + 2x AMS-2500 in my home, both IBM and HDS have failed to ship me either, or even so much as a cheap mug! I hope you guys are listening, 'cause I could use a new mug after my Sun Customer Appreciation mug broke. ;)

2 comments:

  1. Send an e-mail to dco@uk.ibm.com with your address and I'll send you a mug;) Jon of IBM

    ReplyDelete
  2. Wow, somehow I had comment notification turned off! Hopefully that offer still stands. ;)

    ReplyDelete