Sunday, December 13, 2009

IBM SVC - upgrading the hardware to 2145-CF8 (for people who know SVC.)

Hey! Hey you! There's an update at the bottom!

Let's start this one off with a disclaimer:
If you aren't comfortable doing medium to major maintenance on your IBM SVC, this post is so not for you. Seriously. I'm skipping a lot of steps that fall under "things you just do as part of repairs." Instead, you should start over here:
Then consult here:
IBM SVC Reference: Replacing nodes nondisruptively

Okay. Disclaimer done. Let's get down to business. You FINALLY got approval to upgrade your SVC hardware, and it's six kinds of awesome. I cannot stop raving about the improvements in the 2145-8A4 / 2145-8A4 hardware. Seriously. If you can upgrade only one part of your environment in a quarter, this is it. And because it's SVC, you can spread the upgrade out if you have to.

First, let's talk caveats. Like all SVC clusters, hardware intermix is supported within limits. Original hardware (2145-4F2) intermix is not supported with the 2145-8A4. 2145-4F2 clusters can be upgraded to 2145-CF8/8A4 hardware non-disruptively, but prolonged intermix is not supported. All other models can mix with the CF8/8A4 hardware in the same cluster on a short term basis.
If you have 2145-4F2 nodes, you need to contact IBM support prior to upgrading. The documentation for non-disruptive 2145-4F2 to 2145-8A4 upgrades is impossible to locate (again). The procedure is different!
Customers with x305m and x306m Master Consoles (lift the SVC label to check your MT-M) will need to go to the SVC support website and read the advisory regarding system boot problems with these systems. This will only affect the Master Console and not the SVC. Frankly, I recommend buying a pair of small x Series and making the Master Console a MS Cluster. The default MC ship sucks.

First, prerequisites. One, you should already be on 4.3.1.10 or better. You have the latest PTF. All I/O groups have two node members. Normal cluster maintenance has been done and all errors cleared as "FIXED." No vdisks or mdisks are offline or degraded. You have the latest software upgrade test utility and you have run it with no errors returned.

Now let's talk about the 8A4 versus the CF8. I'll make it super, super simple for everyone to save you a ton of headaches.
8A4 is virtually identical to the 8F4, except using SATA simple-swap disks. 8GB of memory, single 3.0GHz dual core 6MB CPU, PCI-Express 4 port FC HBA, and that's about it.
CF8 is the one you want. Seriously. CF8 steps up the hardware big time by packing in the latest generation 2.4GHz quad core 12MB Xeon. Then it cranks it up with an LSI SAS RAID controller and 6 SAS bays taking both SAS and Solid State disk for cache. Adding to the awesome is dual redundant power supplies.
Breaking it down: all you care about is SSD or no SSD. Even if you don't use SSD, you want the CF8. Seriously, it's a monster. (I don't recommend blindly using SSDs, I'll get to that later on. Buy them anyways. Just read on further, okay?)

So you've got your new hardware. You've got your existing nodes upgraded to 5.1. Are we ready to rock? You betcha!
  1. Install and cable your new CF8 nodes. Don't connect the ethernet or FC cables yet!
  2. Did I say turn your CF8 nodes on? No! Don't! 
  3. Go to your existing cluster, and locate your configuration node.
  4. After you've found your configuration node comes the ugly. You need a list of dependent vdisks via "svcinfo lsnodedependentvdisk" - oh, and any dependent quorum disks.
  5. IBM says you should stop I/O to dependent vdisks. If you aren't using SDD, probably you should. If you are using SDD at the hosts, just be extremely careful. Test with non-production first, obviously. ESX users, sorry, you're probably boned here if you aren't on 4.
  6. Quorum disks! This is beyond important. Relocate quorum disks BEFORE shutting down a node. I mean it.
  7. Actually write down the WWPN and WWNN (and iSCSI name) of the node you're about to replace.
  8. Ready? NOW you can stop the node!
  9. Node's stopped? Remove it from the cluster!
  10. Power on the removed node, and change it's WWPN and WWNN to all F's. How? While the panel is displaying "Node WWNN:" press and hold down, press select, then release down. You should see "Edit WWNN" on line 1, and the WWNN on line 2. Use up and down and left and right to change it to F's. Press select to save your changes.
  11. IBM says this is when you install and cable. NO. You should have already done that! Makes things go quicker, trust me.
  12. Power on your glorious shiny new node! (From the UPS, dangit! Not the front panel of the node!)
  13. Hey, did you actually connect those FC and Ethernet cables? No? Good.
  14. Write down the WWNN and/or iSCSI name of your shiny new CF8. You won't need it unless you're reusing nodes, but write it down anyways.
  15. Remember how we made the other node FFFFF in step 10? We're going to do those same steps, except we're going to give our replacement node the WWNN of the node we replaced.
  16. Wait about a minute. The new node panel should display "Cluster:" - if it doesn't, call IBM support. If it does, you're ready to add it to your cluster.
  17. Very carefully take your lovingly labeled (you DO label, right?) cables from your old node, and relocate them to your new node. Every port must match exactly by the Q names.
  18. Use lsnodecandidate to verify that you applied the WWNN correctly. If it's not there in lsnodecandidate, fix the WWNN.
  19. Use "svctask addnode -wwnodename WWNN -iogrp IOGroupName" to add your new node into the cluster as a replacement. If the node is behind on software or a different version, it may take up to 20 minutes. Relax. Grab a coffee. You're just about done anyways.
  20. Verify the new node is online in the cluster. Verify your hosts see the node as restored; if they see a new path rather than an existing path having come back online, something went wrong and you should probably call IBM.
  21. Lather, rinse, repeat for all other nodes. Remember to do the Configuration Node last.
Congratulations. You have successfully replaced that grody old PCI-X DDR hardware with shiny new PCI-Express DDR3 oh-wow-that's-fast.

Now the "aw crap" part of our program: things you aren't going to like about 5.1 and the CF8. The CF8 has a VERY nasty caveat if you use SSD and internal RAID functions. In any CF8 node failure, you must move the RAID controller, cables, and disks to the replacement node. Failure to do so will result in data loss on the SSD array. You must do this for any and every CF8 using SSD and RAID. I don't recommend using SSD in RAID, but I do recommend putting a pair of 146GB SSDs in any CF8 you order. You'll find some way to make use of them sooner rather than later.

UPDATE: The wonderful Barry Whyte of IBM posted his own blog entry about the CF8 with some information on the node upgrades. This resulted in a discussion on Twitter (big surprise there) wherein he corrected me on SSD behaviors and the actual RAID card. (I had to go off photos, so cut me a bit of slack, please.) ;)

One, the SSDs can mirror between nodes. So your worst case is data loss isolated to a single node, requiring a recopy of SSD data between nodes. I should have known IBM wouldn't miss that little point! Minus one point for me, definitely. So yes, you also do need to match your SSDs between nodes within each IO Group. That said, you still need to take your standard data protection precautions - RAID is not a backup, tapes are your friend, and so on.

Two, there's two SAS controllers installed in the new CF8 nodes. One is an LSI as I mentioned, but the LSI is actually the boot disk controller and not the SSD controller. The SSD controller is a custom designed solution from the fine folks at IBM Hursley and is just an HBA. Actual SSD configuration is stored within the cluster, meaning SSD configuration won't be lost in a node failure.
There's still some technical caveats in node repair situations, of course, but it's going to depend on the recommended course of action from support and the CE. (e.g. system planar replacement will require moving the LSI in place as normal, but there may be some trouble when the LSI itself needs replaced.) As ever, failures and repairs depend primarily on field experience, so time will give us all better knowledge there.

Three, he pointed out a use case that I like enough to share here. You can use the SSDs as a VDisk RAID1 mirror of MDisk RAID5's to get some screaming read performance. I can definitely get behind this little trick, especially since it's being done at the VDisk level. Meaning you don't have to play match the array, and can do it for individual VDisks that need more read performance. Write performance is mostly unaffected because you're still dependent on the slowest disk in the pairing.

Ultimately, there's not a single reason to not put SSDs in the new SVC nodes. Like I said; even if you don't use them immediately, you will find a use case for them that works for you. Don't be afraid to experiment with them to find what works for your situation. Just be aware of the restrictions and limitations, the same as ever. SSD is not a cure-all for performance problems - it's another tool to add to your arsenal.

Sunday, November 29, 2009

Flat Tiering - it doesn't suck, it's just your configuration that sucks.

I hear so much anti-flat-tiering noise lately, it's past time I spit out my thoughts on it.
First, what is flat tiering? That's any storage system where you have one or two types of disk, and that's it. Some examples would be the Compellent, Pillar Data Systems, Data Domain Networks, and 3par T and F series. All of these systems use one or two disk types in the system, and that's it. Why is this a good thing? One, it reduces servicing complexity - you don't have to identify which type of disk or controller needs replaced, because they're all the same. Point two goes back to the subject above - configured right, there's no reason to mechanically tier everything.
Flat tiering is not one-size-fits-all. It is not right for every configuration. It is not right for every application mix. But it fits most setups well - when it's configured correctly.

Certainly, this is far from specific to flat tiering. Do not ask how many badly configured mechanically tiered setups I've seen. Bad configurations are more common than people want to admit. Usually it's holdover configurations from initial deployments done in a rush, where there's too much loading or not enough spare resources to perform a reconfiguration on. Either way, bad configurations are bad configurations wherever they are.

Here's the thing with flat tiering though - when your configuration sucks, you REALLY know it. It's not like a DS5k where your bad configuration only pushes your response times up to around 20ms, or an SVC where you reduce theoretical peak throughput by 50% (which still gives you about 4.5GB/sec.) A bad configuration on a flat tiered system can push your response times over 50ms and drop your IOPS through the floor. And it's much easier to badly configure a flat tiered system than a mechanically tiered system.

The first thing you have to do, and I really do mean have to do when you work with a flat tiering system is throw everything you know about configuring mechanically tiered systems away. Forget all of it. It's only going to wreck your configuration. The second thing you have to do is document and test more first. I don't mean figure out what needs high priority and how many terabytes it needs - I mean really document and test. What's your typical IOPS for this application? What's your maximum acceptable response time? Is this random disk seeks or sequential loading?
Let's look at mechanical tiering. Basically, it goes something like this: if it has to be fast, put it on fast disk. If it can be slow, put it on slower disk. If it can be really slow and it's mostly sequential, put it on SATA. Flat tiering does away with all of that. Every last bit. There is no fast and slow disk, there is just disk.
This is where the biggest mistakes happen. People assume that this means they simply shift their entire design philosophy to capacity and nothing else. Capacity becomes the commodity and all performance is equal.

Can I get a show of hands from 3par, Compellent, DDN, and Xiotech who agree that flat tiering means capacity is the sole concern of their product, and all performance for all LUNs presented to hosts will always be 100% equal at all times?
Huh. I didn't see any hands there. Maybe I didn't ask loud enough? Or maybe that's because it's just not true. While performance across all LUNs on most of these systems will tend towards equal that does not mean it is equal, or should be equal on all LUNs.

Think about it. Do you want your PeopleSoft database to have the same access priority as your customer service file shares? Of course not. That's just silly. You need that PeopleSoft database to be on the fastest possible disk you can get. But you're on a flat tiered system, so you don't have faster disk. And there's only one product where 'flat disk' means 'flat performance' - and I won't even mention which, because it's just a bad storage system period. Ask any of the vendors I've mentioned above if flat disk means flat performance, and they'll tell you bluntly "no." It does tend toward flatter performance, but that isn't the

This comes back to the point of better documentation and understanding of storage requirements. If you attempt to treat all storage as equal, you are much more likely to get burned. A performance hit to your Windows shares becomes a performance hit to your ERP systems. Obviously this is the last thing you want to have happen in your production environments. That's why there needs to be greater focus on the storage, and a greater attention to detail. Storage complexity doesn't reduce, it just moves. You need to have a greater attention to detail than you're used to.
Nor can you just apply traditional best practices. They don't apply here. With different RAID modes, different technologies, and different methodologies of achieving storage performance, simply going with what you're used to isn't an option. You can't just throw together 4+1 RAID5's and call it done. You need to look at how each system does RAID, and how it applies to performance from the host perspective. You can't just throw more spindles at a slow database - it may even further hurt performance. You may need to increase controller priority, or adjust how the arrays are caching.

The other thing I see is a stiff resistance to SATA in traditionally FC spaces. This is right and wrong. Whenever any vendor is trying to sell you SATA instead of FC, you should be extra critical. That doesn't mean throw it out. That means you test, test, and test again. You test on real hardware with loads as close to real as you can get. You don't buy a thing until they prove every single claim. The fact is that SATA sucks for random, period. Every single vendor I've named knows and acknowledges this - that's why they all offer FC or SAS drives and SATA drives. If the salesperson is trying to push you to go ALL SATA, chances are they don't know what the hell they're talking about, or they see something in your workloads that you missed. Understanding your workloads is something you need to do before you even start talking to sales.

Flat doesn't necessarily mean really flat. Sometimes it means replacing 6 controllers and 32 arrays of FC disks with one or two controllers with far less arrays, achieving equal or greater performance. It does not need to mean limiting yourself to a single disk type or speed.
And let's go back to making them prove their claims. 3par can brag about some of the best SPC-1 results around, the F400 pulled off 93K IOPS with a respectable response time. That's a great demonstration of what their technology can do, using 146GB 15K FC disks. It is not a demonstration of what it will do in the configuration they're selling you. Your workloads might push their controllers harder than they expect. Your workloads may be poorly suited to the disks they're suggesting. Test, test, and test again. Make them back up their promises in writing. That's true of all storage, sure, but doubly so in this space.

The thing is, I know for an absolute fact that at least two vendors I've named in here (no, I can't say who) have backed up their claims of offering better performance and reliability in writing, in full. I know this because one, I was involved directly or peripherally in it, and two, they made good on those promises. I seriously can't say who, because it is NDA'd and proprietary information regarding customer agreements.
But I will tell you right here, right now, that if you ask any one of the vendors listed above to back up their performance and reliability claims in writing, with a guarantee to cover costs of removing them or going back to your existing storage there are two who can and will look you right in the eye, say they can do it, and will put it down in writing. And the requirement that you test, and work with support to achieve it? If that isn't standard operating procedure for you on traditional arrays, you need to reexamine your SOPs. (Again, test test test! You cannot test enough!) Anybody who won't back up their claims in writing, either shouldn't be making claims, or should be axed from your shopping list fast.

Oh, and that flat array I won't mention? IBM XIV. My recommendation? Don't even touch that thing. It is dangerous, and not ONE of the claims I have heard from sales in the past is true. IBM's own presentations show that XIV can barely beat the DS3400 in random IOPS and can't even match the DS3400's response times. XIV is complete crap for anything that isn't purely sequential read with <40% write and <20% random. If that isn't true, why hasn't IBM backed their claims on XIV with SPC results? DS3400 has 'em, DS5300 has 'em, XIV still doesn't over two years after it's introduction.

Monday, November 23, 2009

My FTC Mandated Excessive Disclosure

So, I figure I should put this out here. That way nobody can say I'm partying with some company's reps in Las Vegas during lulls.

First and foremost, this blog represents my opinions. People are free to agree with them, but they're still my opinions. Not any employer's, or any other person or thing. Mine. Nothing I say here represents the opinions of any other person or company, and I never speak for anyone else.
I also may post or publish in other places and other blogs, one way or another. That doesn't change the fact that my words are my own, my opinions are my own, and they don't represent anyone other than me as a person. If that ever changes, I'll be sure and mention it clearly.

I never have and never will accept payment in cash, hardware, or anything else in return for favorable opinions or statements. While I may do consulting, my opinions are not for sale. Sorry.

If you've got something neat or cool you want me to look over, then let me know! I may or may not talk about it here. And if I forgot something you know about in my writing, well, hey. I'm an IT guy, but I don't know every product or technology on the market. Sorry for the omission, but I can't write about products I don't know, and I won't write at length about products I'm not comfortable working with.
If you really, really want me to write on something, then my time is for sale. (Hey, I do consulting. Got to pay the bills somehow.) However, just because you're paying me for my time doesn't mean you get to pick what I say. You get to pick the topic and the deadline, and even where it gets published; you just don't get to pick the outcome.

I have or have had access to information which is covered under NDA from several companies, as part of my prior employment. This information is frequently incorporated in my thought process and writing, without being disclosed. Because of the nature of the NDAs, I cannot disclose who they are with or what they cover. Some of these agreements did or do incorporate cross-marketing or case study agreements, but they never came to me for quotes (because I would say something sucked if it sucked, and they don't like that for some reason.)

I'm typically pretty blunt in my writing here. Please don't confuse this for who I am, or how I usually write. This is my blog. That means when I have a bad day courtesy some vendor making a stupid design decision, it just may end up here. (Actually, it probably will end up here.) By the same token, if something makes my day, I'll probably write about it here. I'm going to point again to the first statement - these are my opinions. Insert obligatory 'opinions are like' here.

I lurk around in various places, and generally I don't advertise that I'm watching or listening. I may also post or comment under aliases. All this means is that one, I value my privacy, and two, I prefer not to be harassed for having an opinion of my own. (Yes, I have had people harass me at length. I have better things to do than put up with it.) People who know, know. Those who don't, don't expect me to change that any time in this lifetime.

Hm, yep. That covers everything.

Thursday, November 12, 2009

The Protocol Wars Continue.

http://viewyonder.com/2009/11/12/the-end-is-nigh-for-protocol-passionistas/
Hey guys. My turn.

A lot of the Protocol Passionistas are arguing for all the wrong reasons, absolutely agreed. Some folks probably lump me in there, because I am resolutely against FCoE at this point in time. But not because I like having to run MTP and pinch my fingers in high density cassettes. And not because I hate Ethernet. Oh, and also not because I'm against FCoE either.

The problem I have with FCoE, is the problem I had with NDMP, with iSCSI, with NFS, with any number of protocols. People like to jump the gun and don't pay enough attention to best practices or the limitations of the protocol. FCoE has great potential for good, but far more potential for bad. Think about it for a minute. Can you honestly say you have never seen a poorly configured SAN or Ethernet switch, not once in your entire career? If you said no, you probably haven't been looking. I actively seek those problems, to solve them before they become real problems.

Now let's apply this to converged networking, where we now have FC (disk and tape,) iSCSI (slower disk,) and Ethernet (networking and NAS,) all carried on the same fiber through the same adapter. The Reliable Ethernet part of the stack was only very recently declared tech stable, and is without question the most critical element for FCoE. Buying into things before you know you won't have to forklift for the final standard, not wise. It's important to point out that none of the protocols involved are spring chickens. Even iSCSI is years old. And compared to it's age, real stability is very recent.
And now, we're shifting all of these onto a single converged point. What this means for business is the Elephant in the Room that nobody, and I mean absolutely nobody, wants to admit to. Networking failures and problems, just became everything problems. This isn't a Protocol Passionista point - it's a point period. And a very good and valid one. Lumping all your eggs into one basket, even with redundancy, can be very dangerous. Remember what I said about configuration problems before?

Jumping into the new challenge of converged networking, without meeting the challenge of current networking is such a bad idea, I can't recommend it. And it's still relatively young, giving me additional pause. There is tremendous risk of making mistakes, and mistakes cost real money. Especially when a failed Ethernet port now means failed disk as well.

No question, converged is the future. Like it or not though, the reason isn't what people want it to be. It's not because it's technically superior, it's not because the old division of protocols is a bad idea.
Converged is the future because it's cheaper.
Yes, that's really what's going to drive adoption. The folks controlling budgets saying "why do you need two $1200 HBAs and two $400 multiport GigE cards when the salesman says we can do it with a single $1100 converged adapter?" Ask any budget decision maker - will they take a proven reliable method when there's something significantly cheaper that's 'good enough'?

And once again, it will fall to us to turn 'good enough' into "we can't ever be down unexpectedly ever."

Friday, October 30, 2009

On Virtualizing your Storage

Way back, I went over what's virtualization on Unix, and what's not. Well now it's time to hit storage over the head with the debunking hammer!

So, let's start with defining virtualization in the storage context. Storage virtualization is taking a homogenous or heterogenous set of storage resources, and distributing data over multiple arrays or controllers. That's the simplified version. There's a complicated set of requirements to actually qualify as storage virtualization. I'll break it down in a list format.
  • It must support more than one controller and more than one storage subsystem.
  • It must support more than one vendor's storage subsystem(s).
  • It must support more than one model of storage subsystem.
  • It must support at least one protocol of: Fiber Channel, iSCSI, or NAS.
  • It may or may not include it's own disk storage controllers.
  • Data must be capable of being spanned across two or more attached storage subsystems while being presented to hosts as a single LUN.

So we've got a good list of what is required to qualify, in my world, as storage virtualization. So, let's do the list! What IS Storage Virtualization? (In reverse alphabetical order, and not a complete list.)

Actually Virtualization

  • NetApp V-Series
  • LSI StoreAge SVM
  • IBM System Storage SAN Volume Controller
  • Hitachi Data Systems USP-V
  • Hitachi Data Systems USP-VM

Definitely Not Virtualization

  • HP LeftHand P4000 - scale-out is not virtualization!
  • EMC V-Max - does not attach to ANY other vendor or controller.
  • EMC Invista - does not support to ANY vendor except EMC.
  • Coraid ATA-over-Ethernet products - single vendor chassis with storage built in!

Don't Know Enough, So Might Be!

  • Incipient Network Storage Platform
    They hide all their documentation and technical specifications, so I can't tell if it's just a tool for mirroring and copying between different storage subsystems or it's actually virtualizing.

All that said, there's the argument that could be made that if the product hides the storage behind it and presents a single point of management for your storage, then it's virtualization. But, it's not. It's a gateway.
The EMC people will whine about V-Max being defined as "Not Virtualization." TOUGH LUCK. IT ISN'T. The V-Max is a storage subsystem, which spans data across multiple arrays and multiple controllers within itself. The V-Max does NOT support any externally attached arrays from any other vendors. The people who want to whine about Invista? TOUGH LUCK. IT STILL ISN'T. The Invista slid into supported EMC array cabinets and only worked with EMC.
It's NOT virtualization if it only works with one storage vendor. Period. The point of storage virtualization is to enable heterogenous environments. Be it tiering by applications, saving money by using multiple vendors, or increasing performance by using multiple arrays and subsystems.

So why would you virtualize your storage? There's dozens of good reasons. The one I hear the most frequently is the IBM SVC owning SPC benchmarks. They want that level of performance out of their storage, and they think virtualization is a magic wand. It isn't. Virtualization is, like all things, a piece of the puzzle. No more, no less. Can you rock your world with SPC record breaking performance by going to virtualization? Sure, if you pay for it. Just like everything else.
Virtualization is still a front end to storage subsystems. That means that just because the virtualization engine can do 9,500MB/s random, you're still limited by the arrays behind it. The counteracting component is the use of multiple controllers and arrays. One array does 250MB/s random, but with virtualization you can span the LUN across two, which gets you to 500MB/s random in theory. In reality, it'd likely be closer to 400MB/s, but that's still way up from 250MB/s. Need more performance? Add more controllers. It doesn't hold 100% true, and there's a scaling point where it stops helping, but that's the theory.
From the administration side, virtualization is very appealing. This is also how many gateways claim to be virtualization. A key tenet of storage virtualization is that it must provide a single point of management for your storage, regardless of what's behind it. When you virtualize, all your disk to host provisioning is done in the virtualization engine. You no longer need to slice things out on each controller after looking at loads, what space you have, etcetera. It consolidates the vast majority of your storage management into a single pane of glass.
So far, we've established that virtualization can turn multiple low-performance arrays into solid performing LUNs for your hosts, and your administration nightmares can be drastically reduced.

Now I'm going to tell you why both of those are bad too. First, administration nightmares only go down if you're capable of configuring things that way. You have to be ready to let go of individual arrays for individual applications on your storage subsystem. You create a bunch of large, high performance arrays and give them to the storage virtualization engine to manage. It writes blocks across those arrays. Stop managing spindles, start managing performance. Group arrays by performance and by capacity in your storage subsystem, not by application. Group by application in your storage virtualization engine.

Low performance arrays are STILL low performance arrays. If your issue is seek performance, virtualization will not help you. Seek performance is dependent on the arrays behind the virtualization layer, and the virtualization adds seek penalty - anywhere from 40us to 5ms depending. Putting two SATA arrays behind an SVC will not get you FC. It will get you 1.5 times SATA. Virtualization is not a replacement or workaround for baseline array performance. It's a way to enhance performance. And the most common configuration error? People tier their storage by controller. Tier 1 is this storage subsystem, tier 2 is that storage subsystem, and tier 3 is yet another. You won't realize significant performance increases from this configuration. You'll dramatically increase spindle count, but you become hard limited by controller performance, and unbalance controller loading. The optimal configuration is to span tiers across multiple controllers. 3 Controllers with 10 arrays of 5 SAS15k disks, 10 arrays of 5 SAS10k disks and 10 arrays of 5 SATA disks will typically perform better than 1 controller with 30 arrays of each type of disk. The load becomes more balanced across all three controllers, rather than having one controller sitting idle while one is begging for mercy.

So, how do you determine if virtualization is right for you? If you said "I have more than two storage subsystems and I need better performance and ease of management," I want that gold star back right now. First and foremost, storage virtualization is not for everyone, it's not always appropriate, and it's not a cheap solution to an expensive problem. Storage virtualization allows you to consolidate multiple heterogenous resources into a single point of management and allocation. That's it. Performance benefits should not be at the top of your list of reasons. Does that mean it can't be? No, but it shouldn't be the primary reason you're taking your first looks at storage virtualization, or even your second looks. Many environments can realize performance gains for far less money by analyzing and optimizing their storage configuration to suit their environment. And no, I don't mean best practices. Best practices are a starting point - not an end point. Reconfiguration of existing storage might buy a lot more than you expect.

Storage Virtualization fits most easily in rapidly or frequently changing environments, medium to large environments looking for increased scalability or flexibility on existing or new hardware, or environments with large amounts of administrative overhead. I'll get shot for saying it, but I'm going to anyways - in some environments, storage virtualization can change the staffing requirement from 5 to just 2 or even 1 Storage Admin. That doesn't mean it isn't suited to other environments, just that these environments are the most likely to receive immediate benefit from storage virtualization.
Unlike server virtualization and partitioning, storage virtualization is still rather immature. If a vendor starts telling you they solve everything including the kitchen sink, be wary. Every business should look at the costs and benefits of storage virtualization on a case by case basis, with detailed analysis not just from the vendor, but from internal staff as well. Can you use your existing storage subsystems? Can you expand with new subsystems? Will you have to forklift upgrade the storage behind it to expand further? Does it support your preferred vendors? Who's using it for what application with what results?

Storage virtualization has amazing potential in many, many environments but it also has the capability to burn you just as badly.

Come back later, where I openly mock every VTL on the market for being the utter crap they are courtesy of an obscure company with an unpronouncable name, and complacency on the part of manufacturers!

Storage - What IS non-disruptive?

This grows out of a discussion going on over here:
http://wikibon.org/blog/migrating-data-within-federated-storage/

The two main systems up for discussion here, are the IBM SVC 2145 and the Hitachi Data Systems USP-V class. So, let's start with some background on the systems.
The IBM SVC 2145 is like most IBM products, an amalgamam of acronyms. The official name is the IBM System Storage SAN Volume Controller. The SVC is a Storage Virtualization Engine, presenting hosts with a unified front-end, SAN Admins with a single point of management, and utilizing any number of disk controllers behind it. It is also the current (and probably forever) record holder of just about every SPC benchmark and a large number of TPC benchmarks. The currently available (and not the announced) hardware offers from 8 to 32 4Gbit FC ports per cluster, as many disks as you can fit into 4,096 MDisks as of 4.2.1 - using RAID5 in 4+P that's 16,384 spindles! Maximum actual spanning is limited to 512 spindles for a single VDisk, which is still jaw-dropping.
The Hitachi USP-V is a beast. A lovely, lovely beast I want for my very own at home (except for the power bills.) It doesn't hold a lot of records because not enough people show it love, in my opinion. The latest generation offers up to 1,152 drives in 1-4 expansion frames with up to 128 flash drives, 512GB of cache, and up to 224 Fiber Channel ports. Everyone else, eat your hearts out. The USP-V also includes all of Hitachi's fancy software offerings, and the ability to assume control of and pseudo-virtualize external storage arrays like IBM DS4000's, HDS AMS-series, and so on.

If you just went "wait, these sound like very different products" that's because they are. You see, the IBM SVC is a Storage Virtualization Engine and the HDS USP-V has a Storage Virtualization Engine. But the SVC offers no storage disks of it's own whereas on the HDS USP-V being disk storage is the core functionality. In other words, we're comparing something with no disks to something that's built entirely around disks. Vendors are getting very good at blurring these lines very badly, or explaining their products very vaguely, resulting in some pretty bad customer confusion.

So which is which and what is what? Let's start with the IBM SVC. The IBM SVC is two to eight 1U systems in an appliance form, which you stick in front of supported disk controllers to provide extent-based virtualization (or not) and/or provide you with a single point of management and presentation for a variety of hosts. Connections to hosts and storage are via standard Fiber Channel, and cluster interconnect uses standard Gigabit Ethernet. The SVC also offers the advantage of using IBM's SDD multipath driver which is available for every major OS out there in driver or pluggable software module form.
This brings us to the HDS USP-V. The HDS USP-V is a high end enterprise storage system boasting some of the most impressive specs you can find. It's highly configurable and customizable. It offers gobs of high speed SAS disks as well as support for SATA disks using HDS' RAID1+, RAID5 and RAID6 algorithms. Like the IBM DS8000's, disks go in 4 at a time at the minimum. The internals are connected via Hitachi's proprietary (not in a bad way) Universal Star Network. External storage is attached via Fiber Channel.
(As a note; I'm not including iSCSI because it's only an announced feature on the SVC, and NAS is the realm of the USP, not the USP-V. Plus we're not comparing features, dangit!)

Now if you read the original post on Wikibon that started us down this road, you'll notice that it was about what constitutes non-disruptive block level virtualization. (Or extents, if you want. Pick your poison.) Some folks have said that only the USP-V does it. But, that's not true. You see, the SVC also does everything the USP-V does. So what's going on here? Well, there's two problems.
One, the USP-V is storage with virtualization while the SVC is just virtualization. Two, the definition of non-disruptive is.. ambiguous at best, tenuous at other times, and just crazy at still others.
Let's start with my definition of non-disruptive. My definition of non-disruptive is being able to perform hardware repairs, software upgrades and hardware upgrades without impacting the production environment beyond performance. Most folks will tell you that my definition is pretty darn reasonable.
HDS and IBM like to redefine "non-disruptive" on a per-product basis, to suit their needs. It's marketing, don't pretend to be surprised, okay? This is what they are paid to do.
So, if we go by my definition, why do both of these systems offer you the potential of non-disruptive maintenance and upgrades? Because they both do. And the SVC may even slightly edge out the USP-V in this regard because of it's appliance nature. (I'll explain, I promise!)

The caveat that both systems are subject to is the actual storage subsystem and attached hosts. Yes. The disks and the end consumers of LUNs. The USP-V can legitimately make that claim because without adding external storage, it's still a USP, offering non-disruptive firmware upgrades and hardware replacement within practical limits.
The SVC can also legitimately claim to be non-disruptive block-level virtualization too. Why? Because the SVC itself can do everything HDS claims to be do non-disruptively as well. Data migration between arrays, between nodes (IO Groups, in fact,) and even between clusters. All that it requires is that the storage subsystems behind it be able to do firmware and hardware maintenance non-disruptively.

This is also where their claims fall apart. For our example, we'll be using the venerable LSI designed-and-built IBM DS4800 storage controller and EXP810 shelves. For the record, I hate the DS4k/DS5k because it is absolute crap for enterprise storage. But it's cheap so it's everywhere. It's still crap.
Given the DS4800 as an External Storage Array on the USP-V and the SVC, both solutions fail the non-disruptive claims. Even the ones their own manufacturers make. Why? Because it's a DS4800, and they're both dependent on it. Firmware upgrades on the DS4800 are fraught with terror and most decidedly disruptive, requiring all IO be stopped to the DS4k. That means any arrays with data on the DS4k must have IO stopped at the virtualizing layer before maintenance can be performed. Which means stopping them at the USP-V or the SVC. Which means shutting down production environments using those LUNs. Was that a "whoops" I just heard?
But by the same token, if we take a USP-V with an AMS2500 behind it, and an SVC with an AMS2500 behind it, both pass because maintenance on the AMS2500 is non-disruptive. See how it works? Now you can dislike marketing crap just as much as I do!
However, within the isolated products themselves - that being the USP-V and only the SVC (with no storage,) all tasks are non-disruptive. As of 4.3.x you can take an SVC from 2145-4F2 hardware to 2145-8G4 hardware in the middle of the day with no impact to your production environment beyond performance, with proper planning.
So why does the SVC slightly edge out the USP-V in non-disruptive? Because the SVC is an appliance you put in a standard rack. If you put each IO Group (two nodes) in separate racks, which you should, the SVC can continue to operate normally through physical moves - excepting when managed storage is shut down for moves. The USP-V can't, because the controller is in a single dedicated frame. Yep. That's the entirety of it.

What about host side? HDS claims to be unique in their ability to migrate a disk non-disruptively between USP-V frames. Well, one, the SVC can do this with Metro mirroring (not to be confused with Global mirroring! Global is the one for long distances!) or within a cluster using VDisk mirroring or FlashCopy (which HDS has equivalents for for in-frame, predictably!) Two, apply brakes when the reality of hosts hits.
No, Virginia, there isn't any such thing as a free lunch and migration between frames or between clusters will always, always be disruptive at the host level. You're changing WWNNs and WWPNs on the controllers, even if the rest isn't changing. And you think a host will just smile and eat it? Boy, don't I wish - that would save me SO much trouble, both past and present! No, no. The hosts will get very, very upset with you. So what's the procedure?
Well, I can't speak to the USP-V's since I haven't done it. But I imagine it's somewhat like the SVC's with obvious key differences. The USP-V is doing a migration from Frame A to Frame B. The SVC is mirroring data between Cluster A and Cluster B. To complete the migration on SVC? You unmount the disks at the host. Reverse the mirror direction. Rediscover disks on the host. Mount the disks from Cluster B. Verify everything is happy, and break the mirror. 5-15 minutes of downtime, typically. My guess is that the USP-V is similar in needing to stop IO and unmount, start migration, rediscover disks to pick up the new frame's ownership, and remount while migration is in progress. In theory, this could be fixed in software, but it's a very difficult problem to fix.

So, what've we learned today? One, don't take the vendor's definition of non-disruptive at face value. Ever. Two, do your own homework and don't just settle for "magic quadrants" and glossies. Insist on tech demos that don't simply consist of the vendor demonstrating the feature on a cherry picked array. Insist on hands on time. Insist on talking to real customers.
This post is a great example of exactly why you should. I learned about things the USP-V can do inter-frame that I didn't know before. And hopefully people learned about things the SVC can do that IBM marketing didn't tell them. (Seriously, IBM marketing sucks.)

And I can hope everybody learned a bit about the practical limitations of any storage virtualization solution, be it HDS, IBM, EMC or Joe's Computer Shack.

Aaaaaand the disclaimer!
I don't work for IBM, Hitachi, any subsidiaries, VARs, or BFFs. Despite my lust for the combination of SVC + 2x AMS-2500 in my home, both IBM and HDS have failed to ship me either, or even so much as a cheap mug! I hope you guys are listening, 'cause I could use a new mug after my Sun Customer Appreciation mug broke. ;)

Monday, June 8, 2009

On Virtualizing Your World

I'm not big on buzzwords. They rub me the wrong way. If you need to use buzzwords, then chances are pretty good that I can dismantle your supposed "technology" in a paragraph or less. It's what I do. Virtualization is an interesting thing, in that it's a buzzword where half the time I can dismantle the crap in two sentences, and the other half is split between wondering why people are mislabeling and wondering why people are so rabid about it.

That said, let's get down to the bare metal here. What is virtualization in reality? The no marketing, no lying version.
Virtualization is taking a fixed set of resources, and using them to create an entire "false" hardware environment. In other words, VMWare ESX is virtualization. However, POWER LPARs are not virtualization while WPARs are virtualization. Let's break it down in a list format.


Actually Virtualization
  • VMWare ESX, ESXi, Server, and Workstation
  • IBM AIX WPARs
  • IBM POWER series Virtual I/O Servers
  • HP-UX 11iv3 Secure Resource Partitions (SRP)
  • HP-UX 11iv3 vPars
  • Solaris 10 Containers
  • Microsoft Hyper-V
  • FreeBSD jails

Actually NOT Virtualization

  • IBM POWER series LPAR, DLPAR and microLPAR.
  • HP-UX nPars
  • Solaris Domains and LDOMs

This obviously presents the question of "where do we draw the line?" The simplest way to explain it is that for something to actually be virtualization, there needs to be a full operating environment interdicting between hardware. In other words, the "virtualized" environments have no direct access to hardware. This is why LPARs are not truly virtualization; an LPAR requires specific hardware resources which are allocated exclusively to it, which it has direct access to. Even when sharing CPUs and memory, you have "minimum" values, which are dedicated hardware allocations. That's why Logical Partitions are Partitions and not virtualization, but Workload Partitions are actually Virtualization and not Partitions. Why is a Virtual I/O Server (VIOs) virtualization? Because it provides hardware interdiction for specific resources - network and storage - for other environments.
In a nutshell; it's not virtualization if the environment has direct access to hardware.

So now that we've established what is and isn't virtualization, let's talk about the obsession with virtualizing everything. I mean absolutely everything. I have seen shops asking if they can virtualize their physical network switches. (No. You can't.) What virtualization ultimately comes down to is a single core principle: turning computing hardware into a flexible resource.
In a traditional data center environment, this server runs this application, and that's all it does. The fact is that this model is expensive by all metrics. You then need to roughly buy everything twice, for fault tolerance. An example in a traditional buildout would be to have a Sun V1280 for your mission critical Oracle databases, and a second V1280 for failover. This means buying two systems, getting the same support contract twice, licensing Veritas Foundation Suite and Cluster for two systems, etcetera. (I will get into various implementations as we go along, so be patient.)

At it's core, virtualization proposes to solve this problem by changing multiple divergent systems into a pool of computing resources and increasing utilization. Instead of two systems with 24 cores and 96GB, you have 48 cores and 192GB. The problem is, that's not how it works. The cold hard reality is that you still have two systems with 24 cores and 96GB, period. The difference is that virtualization lets you say that system A can now run multiple independent, isolated operating environments running different applications which might have different software requirements. In our V1280 example, we might use virtualization to run Oracle and VCS on 12 cores with prctl, then run Apache on 4 cores, then share the remainder between old Solaris 9 applications and environments. So instead of having the system sitting 50% idle, you have the system at 20% idle or less.
This is not a replacement for a situation that people continually attempt to apply improperly. Let's say your V1280 (which frankly, is ancient and slower than molasses in January) is sitting at 5% idle when you're lucky. Your load averages resemble speeding tickets. Virtualization will not help you. Get a bigger system or move databases. People assume that virtualization is a magic wand, and that by virtualizing the Oracle environment, they magically gain resources. You don't. You actually lose resources - anywhere from 3-30% of CPU and memory - to virtualization itself.

Stay tuned, as I go over the various implementations (and non-implementations!) of virtualization, and what they really offer - and really cost.

Tuesday, May 26, 2009

On Unemployment, briefly

Things I do like about not having a job:
  • More time for personal projects.
  • Finally time to clean up all my personal files.
  • Free to work on theoretical engineering projects.
    (I can fit 2,500 PeopleSoft Financials users into one rack!)

Things I don't like about not having a job:

  • Not having money to fund personal projects. (The car languishes again. Sigh.)
  • COBRA. Just, ouch.
  • Not having resources for engineering projects.

Tuesday, May 5, 2009

I'm here.

Woo. I have a new blog. This'll be repointed at the new server when it gets online. In the meantime, I'll still dispense random insights on technology and the like.