Phil on Stuff: December 2009

Hey! Hey you! There's an update at the bottom!

Let's start this one off with a disclaimer:
If you aren't comfortable doing medium to major maintenance on your IBM SVC, this post is so not for you. Seriously. I'm skipping a lot of steps that fall under "things you just do as part of repairs." Instead, you should start over here:

IBM SVC Reference: Upgrading the SVC software

Then consult here:
IBM SVC Reference: Replacing nodes nondisruptively

Okay. Disclaimer done. Let's get down to business. You FINALLY got approval to upgrade your SVC hardware, and it's six kinds of awesome. I cannot stop raving about the improvements in the 2145-8A4 / 2145-8A4 hardware. Seriously. If you can upgrade only one part of your environment in a quarter, this is it. And because it's SVC, you can spread the upgrade out if you have to.

First, let's talk caveats. Like all SVC clusters, hardware intermix is supported within limits. Original hardware (2145-4F2) intermix is not supported with the 2145-8A4. 2145-4F2 clusters can be upgraded to 2145-CF8/8A4 hardware non-disruptively, but prolonged intermix is not supported. All other models can mix with the CF8/8A4 hardware in the same cluster on a short term basis.
If you have 2145-4F2 nodes, you need to contact IBM support prior to upgrading. The documentation for non-disruptive 2145-4F2 to 2145-8A4 upgrades is impossible to locate (again). The procedure is different!
Customers with x305m and x306m Master Consoles (lift the SVC label to check your MT-M) will need to go to the SVC support website and read the advisory regarding system boot problems with these systems. This will only affect the Master Console and not the SVC. Frankly, I recommend buying a pair of small x Series and making the Master Console a MS Cluster. The default MC ship sucks.

First, prerequisites. One, you should already be on 4.3.1.10 or better. You have the latest PTF. All I/O groups have two node members. Normal cluster maintenance has been done and all errors cleared as "FIXED." No vdisks or mdisks are offline or degraded. You have the latest software upgrade test utility and you have run it with no errors returned.

Now let's talk about the 8A4 versus the CF8. I'll make it super, super simple for everyone to save you a ton of headaches.
8A4 is virtually identical to the 8F4, except using SATA simple-swap disks. 8GB of memory, single 3.0GHz dual core 6MB CPU, PCI-Express 4 port FC HBA, and that's about it.
CF8 is the one you want. Seriously. CF8 steps up the hardware big time by packing in the latest generation 2.4GHz quad core 12MB Xeon. Then it cranks it up with an LSI SAS RAID controller and 6 SAS bays taking both SAS and Solid State disk for cache. Adding to the awesome is dual redundant power supplies.
Breaking it down: all you care about is SSD or no SSD. Even if you don't use SSD, you want the CF8. Seriously, it's a monster. (I don't recommend blindly using SSDs, I'll get to that later on. Buy them anyways. Just read on further, okay?)

So you've got your new hardware. You've got your existing nodes upgraded to 5.1. Are we ready to rock? You betcha!

Install and cable your new CF8 nodes. Don't connect the ethernet or FC cables yet!
Did I say turn your CF8 nodes on? No! Don't!
Go to your existing cluster, and locate your configuration node.
After you've found your configuration node comes the ugly. You need a list of dependent vdisks via "svcinfo lsnodedependentvdisk" - oh, and any dependent quorum disks.
IBM says you should stop I/O to dependent vdisks. If you aren't using SDD, probably you should. If you are using SDD at the hosts, just be extremely careful. Test with non-production first, obviously. ESX users, sorry, you're probably boned here if you aren't on 4.
Quorum disks! This is beyond important. Relocate quorum disks BEFORE shutting down a node. I mean it.
Actually write down the WWPN and WWNN (and iSCSI name) of the node you're about to replace.
Ready? NOW you can stop the node!
Node's stopped? Remove it from the cluster!
Power on the removed node, and change it's WWPN and WWNN to all F's. How? While the panel is displaying "Node WWNN:" press and hold down, press select, then release down. You should see "Edit WWNN" on line 1, and the WWNN on line 2. Use up and down and left and right to change it to F's. Press select to save your changes.
IBM says this is when you install and cable. NO. You should have already done that! Makes things go quicker, trust me.
Power on your glorious shiny new node! (From the UPS, dangit! Not the front panel of the node!)
Hey, did you actually connect those FC and Ethernet cables? No? Good.
Write down the WWNN and/or iSCSI name of your shiny new CF8. You won't need it unless you're reusing nodes, but write it down anyways.
Remember how we made the other node FFFFF in step 10? We're going to do those same steps, except we're going to give our replacement node the WWNN of the node we replaced.
Wait about a minute. The new node panel should display "Cluster:" - if it doesn't, call IBM support. If it does, you're ready to add it to your cluster.
Very carefully take your lovingly labeled (you DO label, right?) cables from your old node, and relocate them to your new node. Every port must match exactly by the Q names.
Use lsnodecandidate to verify that you applied the WWNN correctly. If it's not there in lsnodecandidate, fix the WWNN.
Use "svctask addnode -wwnodename WWNN -iogrp IOGroupName" to add your new node into the cluster as a replacement. If the node is behind on software or a different version, it may take up to 20 minutes. Relax. Grab a coffee. You're just about done anyways.
Verify the new node is online in the cluster. Verify your hosts see the node as restored; if they see a new path rather than an existing path having come back online, something went wrong and you should probably call IBM.
Lather, rinse, repeat for all other nodes. Remember to do the Configuration Node last.

Congratulations. You have successfully replaced that grody old PCI-X DDR hardware with shiny new PCI-Express DDR3 oh-wow-that's-fast.

~~Now the "aw crap" part of our program: things you aren't going to like about 5.1 and the CF8. The CF8 has a VERY nasty caveat if you use SSD and internal RAID functions. In any CF8 node failure, you~~ ~~must~~ move the RAID controller, cables, and disks to the replacement node. Failure to do so will result in data loss on the SSD array. You must do this for any and every CF8 using SSD and RAID. I don't recommend using SSD in RAID, but I do recommend putting a pair of 146GB SSDs in any CF8 you order. You'll find some way to make use of them sooner rather than later.

UPDATE: The wonderful Barry Whyte of IBM posted his own blog entry about the CF8 with some information on the node upgrades. This resulted in a discussion on Twitter (big surprise there) wherein he corrected me on SSD behaviors and the actual RAID card. (I had to go off photos, so cut me a bit of slack, please.) ;)

One, the SSDs can mirror between nodes. So your worst case is data loss isolated to a single node, requiring a recopy of SSD data between nodes. I should have known IBM wouldn't miss that little point! Minus one point for me, definitely. So yes, you also do need to match your SSDs between nodes within each IO Group. That said, you still need to take your standard data protection precautions - RAID is not a backup, tapes are your friend, and so on.

Two, there's two SAS controllers installed in the new CF8 nodes. One is an LSI as I mentioned, but the LSI is actually the boot disk controller and not the SSD controller. The SSD controller is a custom designed solution from the fine folks at IBM Hursley and is just an HBA. Actual SSD configuration is stored within the cluster, meaning SSD configuration won't be lost in a node failure.
There's still some technical caveats in node repair situations, of course, but it's going to depend on the recommended course of action from support and the CE. (e.g. system planar replacement will require moving the LSI in place as normal, but there may be some trouble when the LSI itself needs replaced.) As ever, failures and repairs depend primarily on field experience, so time will give us all better knowledge there.

Three, he pointed out a use case that I like enough to share here. You can use the SSDs as a VDisk RAID1 mirror of MDisk RAID5's to get some screaming read performance. I can definitely get behind this little trick, especially since it's being done at the VDisk level. Meaning you don't have to play match the array, and can do it for individual VDisks that need more read performance. Write performance is mostly unaffected because you're still dependent on the slowest disk in the pairing.

Ultimately, there's not a single reason to not put SSDs in the new SVC nodes. Like I said; even if you don't use them immediately, you will find a use case for them that works for you. Don't be afraid to experiment with them to find what works for your situation. Just be aware of the restrictions and limitations, the same as ever. SSD is not a cure-all for performance problems - it's another tool to add to your arsenal.

Phil on Stuff

Sunday, December 13, 2009

IBM SVC - upgrading the hardware to 2145-CF8 (for people who know SVC.)

Blog Archive

About Me

Followers