Phil on Stuff: March 2010

Saturday, March 27, 2010

The nVidia GTX480 - and why it's a 400W+ piece of junk

Okay, I did a really bad job of explaining why the GTX480 is a 480W+ part and why a 250W TDP isn’t what it puts out in terms of heat on Twitter yesterday. So, let’s try it again, with the math backing it up.

First of all; standalone GPUs are worthless. Claiming a GPU’s wattage by itself is like telling me how many calories are in the pepperoni – but not the rest of the pizza. The same with putting the numbers together – I could just have the pepperoni, but then it’s not a pepperoni pizza now is it? So how are you going to play games on just a video card? Well, you aren’t. So rating just by the GPU is only useful when sizing a system, and even then the numbers usually end up heavily fudged.

So let’s take the latest part paper-launched (won’t be available to buy until April) by nVidia, the GTX480. TDP of 250W – for the GF100, not the card. Or maybe it’s the other way around? Your guess is as good as mine, but based on data I’ve seen, 250W TDP for the card is probably somewhere around 25-50W on the conservative side, estimating for heat losses. With a TJmax of 105C and typical operating temperature of over 80C at the die, you’re talking about massive efficiency loss from temperature. Heat reduces efficiency, especially in electronics, so when you’re running voltage regulators managing 250W anywhere close to the top of their rated operating temperature? You start incurring some pretty nasty losses.

HardOCP was too kind in their power tests, because they utilized a benchmark called FurMark. FurMark runs almost exclusively on GPU with very low CPU loading. This is to prevent CPU binding; but it also allows modern Intel CPUs to go into lower power states. This is NOT representative of what you’ll see while gaming at all. Games stress GPU and CPU, pushing both towards max power draw and TDP. In fact, most modern games will get a Core i7 920 pretty near 130W draw.

So, let’s call it 125W for the CPU, 35W for the motherboard, 50W for a pair of SATA disks, and 250W for the GTX480. 125+35+50+250 = 460W. This number is particularly amusing to me, as some years ago, the specially built WTX power supply for Dual Athlon MP boards with AGPPro150 slots produced exactly that number. It also makes it a 400W+ part, because even if you switch to a Socket 1156 part at 95W, you’re still over 400W. AMD? Still over 400W. There is no way to build a usable system around a GTX480 with 90% load at 400W or less. That means 80 Plus certified power supplies most likely won’t help you till 600W to 800W absolute best case (50% and 80% load.)

But wait, doesn’t that mean you only need a 500W power supply? NOPE! Not even remotely close. That’s the “running estimate” – but for startup we absolutely have to rate by maximum draw plus 5 (the plus 5 is rule of thumb.) So that’s 135+55+55+255 or 500W ignoring fans. We have to add another 15W for fans, that’s 515W. Oh, and that’s what the individual devices are drawing – that’s not a real number. It’s real in the sense that it’s the minimum startup wattage, but it doesn’t account for various losses. That 1% loss pushes it to 521W of DC supply required to start up.

We have to adjust and base off our actual efficiency versus wattage to compensate for typical AC-DC losses and startup draw. That gives us an actual need of somewhere north of 600W. Otherwise, we’re going to just pop the power supply any time we fire up Modern Warfare 2 or Battlefield. That presents… a bit of a problem. Not to mention the demonstration that the GTX480 basically goes from its idle wattage of “only” 47W (total system draw of 260W idle for SLI configurations!) and jumps over 50W just to open a webpage, and we’ve got a REAL winner here folks. Yep, and by the way, those numbers are extremely conservative and don’t leave any overhead at all. That’s what you’re going to see from the wall while gaming – 600W and higher. Oh, and don’t install any additional hard drives, attach USB devices, etcetera. In fact, ignore those numbers and go with HardOCP’s recommendation of minimum 700W for a single card.

Now, to be entirely fair, we need to establish a comparison. We’ll use the card that the GTX480 is supposed to “kill,” the AMD/ATI Radeon HD5870. The HD5870 has a maximum board draw of 188W. HardOCP found that the HD5870 system drew 367W at the wall which gives us an actual DC load of 320W. 320-188 gives us only 132W for the remainder of the components. So we’ll just call the HD5870 at 200W of draw after losses and everything, giving us a whopping 120W for a mid-range desktop board, heavily overclocked i7 920, and a SATA hard drive. So take your pick here, folks. Either these numbers at 200W are right, or the HD5870 is actually maxing out its DC draw at somewhere around 130W. Personally, I don’t have a hard time believing everything else at a combined 167W.

To be fairer, we have to do the same power math we did for the GTX480 to establish our power for startup – 135+55+55+193 = 438W DC for startup with a Core i7 1366. But wait! What happens if we switch it to a Core i5 or i7 Socket 1156, which is a TDP of 95W? That gives us 100+55+55+193 = 403W, and we’re running a 20W margin of error on both ATI and nVidia configurations. With that 20W margin of error plus 15W for fans, the ATI still ends up below 430W. In other words, if you didn’t mind having little headroom and running the PSU pretty hard, an ATI HD5870 can easily make do with a good quality 500W unit which will see a maximum draw from the wall of somewhere closer to 495W with everything at its absolute limit.

So! If we go with everything else at a combined 167W, let’s run the GTX480’s 480W number. Real DC draw is around 418W. We subtract the “everything else” category of 167W and get 251W in free air at a temperature of 93C. The free air part is very important, and we’ll get to that in just a little bit here.

Now here’s an exceptionally important point – HardOCP witnessed GTX480’s exceeding 900W at the wall in 2-card SLI when CPUs barely added into the mix, with an 87% efficient power supply. If we give them the full benefit of the doubt and say 250W for each GTX480, that leaves over 400W for the rest of the system. Let’s make our correction; 87% of wall is actual DC – that’s 783W DC side at 900W. We’ve already established that every other component combined is roughly 167W of draw. We’ll be exceedingly generous and jack those up to 200W. Notice how the numbers still don’t add up there, at all? Remember, it was OVER 900W at the wall and on an 87% efficient power supply at that! Seriously. Let me spell it out for you.

783 – 200 = 583+ / 2 = 291+ per card in SLI.

That means in SLI at 92C with fans screaming, those cards are actually drawing nearly or over 300W of DC, which translates to somewhere north of 650W at the wall. There’s some HUGE power losses going on there from heat, no doubt, since we’re talking about cherry picked cards from nVidia with non-release BIOS. These are, in other words, not actually representative of what AIB partners will be putting out. AIB partners will likely use lower cost voltage regulation and support components to try and handle the costs that are already non-competitive. If we presume that the card is a 250W combined part but gets 90% efficiency from supplied power, we get right around 270W. And as we’ve already covered, just the card is useless. Oh, and three way SLI? Dream on. At 2-way SLI, you’re pushing 1000W at the wall. There’s one 1500W PSU available on the market, it requires your outlet be wired for a 20A breaker, and it’s going to set you back $400+. Sorry folks, 1200W won't cut it - 900+250+ = 1150+. Oh, and then there's that little problem where your noise level is actually 64dBA for a single card and over 70dBA for two. Remember, dBA is logarithmic, so 64.1 to 70.2 is more than double. These things are dangerously loud and can make you deaf.

Now let’s complicate matters properly; HardOCP did all their tests on a bench in free air. This is a huge deal, because free air means that it’s not in an enclosed chassis. It has a continuous supply of cold air feeding it and completely unrestricted airflow from five directions. PCB and ambient heat is also indirectly radiated to open air independent of the fan movement. All this combined lowers the operating temperature substantially when compared to a card installed in a chassis. In other words, the 93C operating temperature is very much on the low side. This is why nVidia was requiring manufacturing partners to certify their chassis beforehand. When you put these cards into a a chassis, they’re suddenly faced with restricted air flow, the loss of ambient cooling, and the addition of over 100W of ambient heat from CPU, motherboard, hard drives, etcetera. Very very few destktop chassis are 100% thermally efficient – that being, it rejects its entire thermal load and maintains the interior temperature at intake air temperature. I have built and worked on some of the most efficient there is, and typical users are going to have chassis that with a 200W TDP video card, is going to be no less than 15C above exterior ambient (or deafeningly loud.)

Now we have a real problem, because that means we’re running at the ragged edge as is. If we call exterior ambient at 74F that gives us an ambient of 23C. If we call it 15C, that gives us an interior ambient of 37C or about 97F. In free air testing at HardOCP, 74F ambient isn’t an unreasonable estimate and is actually probably high. So end users will be applying an ambient temperature 15C higher than the temperatures that let a GTX480 run at “only” 93C. With the loss of ambient thermal radiation, and airflow restriction from components and the chassis, plus an additional 150W+ of added thermal load applied unevenly to all fans… well, you can bet that a GTX480 will never be quiet, and it will be screaming as it tries to maintain the die at 100C or below. This is why I do NOT like free air noise testing. Yes, it tells and shows you just how loud the fan is, but only in free air. Typical users will have these parts in a chassis, which can and will have significant effects on the temperature and cause the fans to spend more time at higher speeds. In fact, it will affect all fans in a modern chassis.

I don’t particularly have a horse in this fight other than my standard policy of “if it doesn’t work, if it’s not the better part, then I don’t want it.” The GF100 fails both of those, miserably and with great gusto. The performance numbers aren’t compelling at the price point, even if ATI doesn’t cut prices on the 5800 family parts. The power draw, heat, and noise generated add up to something I could even consider putting in a desktop system. Nothing short of watercooling is going to get that noise and temperature under control. Even the Arctic Cooling HD5870 part that they rate to 250W dissipation can’t do it (in part because it doesn’t exhaust outside the chassis, but onto the card instead.)

Not to mention the fact that they’re putting a 250W part totally dependent on game developers playing ball for its performance, up against a 188W part that in most situations offers equal or better performance. To justify a 250W, $499 part over a 188W, $410 part you’re talking about around a 30% performance jump needed. But nVidia delivers somewhere around maybe 5% except in tests written specifically for the card, or a 197W part. It’s only worse when you stack up the HD5850 at 151W versus the GTX470 at 215W – nevermind the fact that it’s a $350 part versus a $280 part. Again, same thing, 30% jump needed to justify the price and power, and it’s just not there.

So with all these numbers and all this math right there, why don’t the review sites point this out? Simple; because they don’t want to piss off the people who feed them hardware. They have to leave doing this math as an exercise to the reader, because pointing out design failures like this in detail will lose them access to the hardware. Especially with nVidia – they’ve deliberately cut off and retaliated against sites that refused to lie for nVidia’s benefit.

So, there you have it. The GTX480 is a 400W+ card and the 250W draw is debunked. Where’s all the power going? Ask nVidia – they’re the ones who’ve delayed GF100 multiple times and been having issues with leaky transistors and having to jack up the voltages. I’m not an electrical engineer, but I can do basic math, and that’s all you need to see that the GTX480 definitely goes in the design failure column along with the NV30 series (AKA GeFarce FX AKA DustBuster.) This isn’t a card I could recommend, much less sell. And hopefully you’ve learned a lot more about desktop system design while I ripped it apart.

Wednesday, March 3, 2010

Why I Hate "Good Enough"

I really truly hate the “Good Enough” mentality that’s become so pervasive in IT these days. It’s not because I think everything should be five-nines – that’s a common misconception of my attitudes and thoughts. Far from it – five-nines is prohibitively expensive and downright absurd for almost everyone. (Which is also why I dislike anything claiming five-nines based on not going down in 12 months. Seriously, that’s not five-nines.) More simply, if “Good Enough” was really the ultimate level in reliability, then why does any business bother with Disaster Recovery?

Here’s how Good Enough is implemented most commonly these days:

The Chances Of This Going Down Are Too Small To Bother With Planning For
We Don’t Think This Will Go Down So We Won’t Plan For If It Does
We’re PRETTY SURE This Won’t Go Down But We Have Support on Speed Dial
We Clustered This So We Totally Know It Won't Go Down
If Something Goes Wrong, Call Support And Hope They Know Why

Here’s how Good Enough is implemented by yours truly:

Chances of Failure are Very Very Low, BUT If It Dies, We Do This
We Don’t Believe This Can Fail, BUT If It Does, We Do This
Confidence Is Moderate, BUT We Have A Plan For Failures
It’s Clustered, BUT If The Cluster Has Problems, We Do This
If We Have A Problem, Involve Everyone And Find The Root Cause By Any Means Necessary

Notice the difference? I do something very different – I make the presumption of failure. That doesn’t mean everything’s crap, even though much of it these days is. It presumes that at some point in time, for some reason, failure will occur. I don’t know when, I don’t know why, and I may not even know how. But I intend to and absolutely require that there be a plan in place for dealing with that failure. Things like maintenance are planned, but you would probably be shocked at how many organizations plan their maintenance poorly by my standards. And my standards aren’t that unreasonably high, either.

I require a plan for the maintenance, a plan to back out if things should go wrong, those you’ll find everywhere. But I also require a plan for restoring function if a back out should fail, and a plan for forcing ahead if a back out is impossible. Why do I require these things? Because what happens if the back out fails? I’ve had it happen, and it’s not pretty. And what happens when you can’t back out changes? I’ve seen that plenty of times – most organizations actually take the stance that if it can’t be backed out, don’t bother with a back out plan, just say it has to go forward. Okay, so what happens when the upgrade fails? There’s no plan in place, no way to go back, you’re caught in a lurch.

I’ve been blessed, or cursed depending who you ask, to see many kinds of failures in many situations. Everything from a single byte of corruption resulting in a failed firmware update to yours truly accidentally deleting the wrong multi-terabyte database. (Hey, think of how many coworkers and employees you know that would actually admit to it, as opposed to just restore from backups and pretend it never happened.) As I’ve progressed in my career, I’ve learned a lot about failures, and a lot about how to manage them and mitigate them. Yet somehow this knowledge seems to just be absent or downright missing at a variety of levels.

I wish I had some good answers as to how we can inject this back into the IT operations and business operations processes. Unfortunately, I don’t, other than pointing it out here. Seriously folks, think about this. What’s your procedure when a round of maintenance goes awry? Chances are your first and only answer is “call support.” Calling support is all well and good, and an important step, but it shouldn’t be your only step. It’s also not a step you should be injecting between “perform upgrade” and “maintenance complete.” In other words, your process flow chart shouldn’t be a series of straight lines, and they shouldn’t all be pointing down or right.

Let’s talk example. This is a real situation I’ve been through, with details changed. I’m not going to name names, because absolutely nobody in this situation looks good by any measure.

Maintenance was scheduled on a development system for Friday afternoon. This maintenance was operating system patches and a scheduled reboot as part of the patching process. The process had been done many times before with no problems, so there was no established plan for backing out patches. Install the bundle, reboot, done.

After installing the bundle, the system was rebooted and refused to go to multiuser, complaining of problems with system libraries. Upon examination of the logs, it was decided that it would be too much hassle, and rather than attempt to repair, a quick script would be written to back out the patches. The script failed to back out several individual patches, because they could not be backed out. This was accepted as “just how it is” and the system was rebooted again.

Now the system refused to go past single user, and critical services could not start. Files were determined to be missing, and an attempt was made to install them from the OS media. This failed because there were incompatible patches on the system that could not be backed out. A SEV1 call was placed to the operating system vendor’s support.

Now, let’s start with our first failure – the presumption that just because it worked before, it would work again. Then it’s compounded by not having any real plan – install, reboot, done is not a plan. Further complicating it, a back out attempt was ad-libbed, without understanding that some patches couldn’t be backed out. It only gets worse when this is accepted as “just normal” without any explanation or understanding of why or what. It’s likely at this point, dependent patches were removed because they could be backed out despite the patches that COULDN’T be backed out being dependent on them. This is a fatal presumption of “the vendor would never do something that stupid.” Sorry; every single vendor is that stupid at one point or another, and they make mistakes just like everyone else.

So at this point, the entire process has become ad-libbed. Do we restore from tape? Back out more? Reattempt patching? Who knows! There’s no plan; we’re shooting from the hip. So now we’re on hold for support with a system that’s been down for hours, its 9PM on a Friday, and it has to be back online by 7AM Monday or it’ll throw off a multi-million dollar project. This primarily came about for the worst reason of all; “development” was treated as a sandbox where it was okay to do just about anything, despite it being very actively used for development work.
Ultimately, the vendor’s response made the problem even worse still: “oh, yeah, we know about this problem. You have to restore from tape, and if that doesn’t work, you have to reinstall.” So the system was restored from tape, with limited success. Reinstallation of the system wasn’t an option, because of the way things had been configured and had to be built. But leaving restoring from tape and reinstalling the system as the only repair methods is what the operating system vendor considered to be a Good Enough answer for their Enterprise product.

Ultimately, the system continued to have problems and turned into a very expensive three month project performing a total rebuild of the system and all its environments, because everyone involved from management to the system administrators to the operating system vendor all said “that’s Good Enough.” It cost management a lot of respect, the system administrators a lot of time, the business a great deal of money, and the vendor lost the customer – probably forever.

So the next time somebody tells you something is Good Enough, don’t buy it. A Good Enough plan isn’t – and never will be Good Enough when it’s your business at stake. Good Enough doesn’t mean building the most reliable infrastructure you can then throwing up your hands and saying “that’s as good as we can get, oh well!” It means accepting that things will fail, things can fail, and that nothing will ever be perfect – then taking that knowledge and acceptance to build plans for that.

If you’ve planned for and built for the fact that failures are a when and never an if, and defined a process to work around and repair those failures, then hey, that’s Good Enough for me.

Tuesday, March 2, 2010

IBM Storage UK Has Codified Stupidity

cod·i·fy (k

-f

, k

tr.v. cod·i·fied, cod·i·fy·ing, cod·i·fies

1. To reduce to a code: codify laws.

2. To arrange or systematize.

Pay attention to number 2 there. Chris Mellor of The Register got some words from Steve Legg, IBM UK’s Chief Technology Officer for Storage.

These words made it quite clear that it there's an intent to codify stupidity within IBM Storage UK. He said simplify, but this is me, and I don’t like lies and obfuscation. What he actually meant is “collapse the offerings, and then make some patently ridiculous and arguably false statements to the press.” The word choices he made were exceptionally poor, but the choices made in "collapsing" are far worse.

And here comes the hatemail because me, Mister I-Love-SVC and I-Love-DS8K is calling IBM Storage “stupid” and “ridiculous” and thus I must now be a shill for $MostHatedVendor or whatever. Except I’m STILL not employed or representing anybody but myself. Seriously, if I was shilling, I would have built myself a Dragon 20w with dual 5970’s. Or I would have at least put 16GB in my ESXi box instead of 8GB.

Anyways, let’s be honest and start with the good. I like honest, and I like good. Who doesn’t? SONAS – forget IBM’s acronym of Scale-Out NAS. I demand they change the acronym to Seriously Ossum NAS. It’s a brilliant design in its overall simplicity, combined with absolutely ridiculous density. If anyone’s going to get this right, it’s not Sun – I mean Oracle, it’s going to be IBM. They have the budget and resources. And SONAS delivers, if the order is NAS. I am a little dubious of some aspects of SONAS, but these are software issues and not hardware issues. Software issues should be able to be fixed without needing to forklift the hardware.

What software issues am I concerned about? SONAS is going up against not just Oracle, but NetApp, EMC, HP, Dell and so on inevitably. In that regard, it’s lacking in the snapshot to application integration NetApp and others have. At the price points IBM’s talking on SONAS? Integrating with applications for snapshots is pretty much expected. There are a lot of other software integration and capability questions that IBM has so far left unanswered (without NDA,) so it’s very much a wait and see. The hardware has the potential, it’s up to the software to execute. But at least they’ve solved the back end portion already with GPFS.

The good while being less than brilliant; “VDS.” This ‘offering’ is almost insulting to the capabilities of the IBM SVC. The VDS product cripples the SVC by chaining it to IBM’s low and midrange storage, the DS3k and DS5k. Look, you’re not likely to sell any business who’s had a DS5k another DS5k. The architecture is positively ancient, and is still incapable of anything beyond the most basic of maintenance being performed online. Any firmware maintenance absolutely requires hours of downtime. The DS3k doesn’t even attempt to fake online maintenance capabilities – it just can’t, and it’s not meant to.

But this is a channel play. Why? Beats me – IBM could certainly use more solutions as opposed to just products. My opinion is that it would be a lot smarter to keep VDS close to the chest, and offer it with DS3k, DS5k and DS8k. Seriously folks, the DS3k and DS5k can produce great performance numbers, but they have not been and will not be true enterprise arrays. You have a minimum 2 hours of downtime per year – that’s minimum, not typical – for mandatory firmware upgrades. Why? DS3k and DS5k require stopping all IO to do controller, ESM and disk firmware. So the SVC’s high availability ends up somewhat wasted here. Only the DS8k is on par with the SVC for high availability while servicing.

And the patently ridiculous and arguably false, otherwise known as codifying stupidity. I’m going to give you a quote, and you’re not going to believe it, but it’s a very real quote.

"XIV can reach up quite a long way and run parallel to the DS8000.” –Steve Legg, IBM UK Storage CTO

Yes, that’s Steve Legg of IBM UK saying that the XIV is the equal to the DS8000. Now Steve, the horse is out of the barn, and you can damn well believe I’m going to call IBM out on this load of manure. That statement has absolutely no basis in fact by IBM's own published case studies and reference sites, and even a cursory review of specifications between the two arrays reveal it to be obviously disingenuous at best.

But let’s have a refresher of those spec sheet contents, shall we?

XIV is comprised of 15 modules totaling 180 1TB 7200RPM SATA disks with 120GB of cache and over 7kAVA of power draw at idle and a peak of 8.5kAVA at 29000BTU/hr. The only RAID type is mirroring, reducing actual capacity to 79TB before snapshot – this is also the maximum capacity of the XIV, 79TB – it is not possible to span frames except to mirror them. You cannot grow past 79TB and there is no intent to move to 2TB disks in the next generation XIV hardware. Disk interface is 12xSATA over Gigabit Ethernet, changing to SATA over InfiniBand in the next hardware release (forklift upgrade required.) Protocols spoken are Fiber Channel 1/2/4Gbit and iSCSI over Gigabit Ethernet with a maximum number of 24 FC ports and 6 iSCSI ports, with host ports removed for Mirroring HA (the only HA method available.) Major component maintenance is limited and customers may perform absolutely no service on XIV whatsoever. And I do mean NONE; even a simple disk replacement must be performed by a specially trained CE. IBM shipped the 1000^th XIV in November of 2009.

DS8000 is now four generations old, comprised of the DS8100, DS8300, DS8300 Turbo and recently introduced DS8700. Based on the IBM POWER architecture as a controller and using custom ASICs, the DS8000 family doesn’t just hold but absolutely owns the SPC1 and SPC2 benchmarks. Two processor complexes provide from 32GB to 384GB of combined cache and NVS. The DS8700 ranges from 16 to 1024 disks using any combination of 73/146GB SSD, 146/300/450GB 15K RPM, and 1TB 7200RPM disks in packs of four or sixteen with a maximum capacity of 1024TB. RAID levels supported are 5, 6 and 10. Disk interface is FC-AL via multiple GX2 connected IO Complexes. The frame ranges from a single wide cabinet to 5 frames (base plus four expansions) with minimum power draw of 3.9kAVA base, 2.2kAVA per expansion and maximum of 7.8kAVA and 6.5kAVA respectively. The thermal min/max is 13400/26500BTU/hr and 7540/22200BTU/hr respectively. Protocols spoken are Fiber Channel 1/2/4Gbit and FICON 4Gbit with a maximum host port count of 128 in any combination of FC and FICON. Almost all major component maintenance can be performed without needing to shut down the DS8000, and all prior models can be field upgraded to the current DS8700 941/94E. Customers may opt to perform most DS8000 maintenance tasks themselves and some hardware repair, including disk replacement.

As you can see, these two systems are not even remotely similar or comparable. The absolute maximum disk IOPS an XIV is capable of, being as generous as we can be at 180 IOPS per disk, is 32,400 IOPS. The DS8700 using FC disks and the same 180 IOPS per disk as a conservative number, is capable of 184,320 IOPS. This is ignoring all buffering, caching and advanced queuing. The DS8700 is proven to be capable of well over 200,000 IOPS with a high number of hosts. IBM refuses to submit XIV to an audited benchmark and their most detailed case study with Gerber Scientific shows XIV only handling a total of 6 systems (claiming 26 LPARs, that's still ridiculously tiny) and using less than 50% of its available capacity.

For IBM to even insinuate that the XIV is “parallel” to even the DS8100 first generation hardware is to basically call their customers idiots; it is the same as telling MotorTrend that your 1985 Yugo 45 can keep pace with a 2004 Ferrari Enzo. It’s only true as long as they’re both doing 25MPH and you’re willfully ignoring everything other than the fact that they both can do 25MPH. Anybody who spends more than 10 seconds reviewing the specification sheets for these two systems or cars will immediately be able to tell that they are not in the same class. Yet IBM would very much like you to believe that their Yugo 45 is just as fast as their Ferrari Enzo. Perhaps a more apt comparison would be that Steve is currently telling you that IBM's Renault Twingo can totally hold at least as many people as their London Double Decker Bus.

Am I calling Steve Legg an idiot? Absolutely not. Steve just made an amazingly bad word choice. Steve Legg is a well respected guy, and not someone who's going to call you daft, especially not customers. But he’s basically said that IBM’s organizational stance is that customers aren't smart enough to spend a few moments reviewing a spec sheet, and seeing the obvious disparity between the two arrays. He’s saying that IBM believes customers are too stupid to see the inefficiency of the XIV as compared to its “green” claims, too stupid to see the raw horsepower of the DS8700, too stupid to tell the difference between 7200RPM and 15000RPM, too stupid to understand that 3.9+2.2/7.8+6.5 kAVA is more efficient than 7+7/8.5+8.5 kAVA. The problem with this is that the special XIV people will latch onto these words, yet again, and continue to use them while they do treat customers like idiots. (Those who claim they don't, I had them telling me to my face that the numbers they were putting up on the screen as gospel, didn't mean anything. Among other things.)

Yet again, this does not mean XIV does not meet some needs. What it does mean is that XIV is still not equal to nor does it offer performance comparable to the DS8000.

His statements show that IBM’s offerings have codified stupidity; “we now sell on the basis that customers are too stupid to read or question us.” When customers push back on the high cost of DS8000, just whip out the significantly cheaper and far less capable XIV without mentioning anything other than "it can run parallel to the DS8000!" Which only goes to further support my arguments that you should be questioning your vendor at length, demanding hands on testing, and refusing to take their word for it on any statements of suitability or performance. The choice is yours – you can challenge your vendor, or you can enjoy the challenge of finding new employment. And you should be really extra careful about what exactly you say to the press, especially when you have a fiefdom that doesn't answer to you itching to abuse it.

Update:
I'm sorry about the VERY poor wording on my own part, and I want to extend my sincerest apologies to Steve Legg if I caused any offense. (I should not be writing so late, obviously.) Steve is by all accounts a great guy, and I'm sure that it wasn't his intent to imply that customers are idiots. The problem is that he made a bad choice of words and phrasing, and that's how it came out. I'm quite positive he knows better, especially since IBM UK is the home of the SVC.
The problem is that's how the words went and how the offerings are now aligned, and what it says to me as a customer. But they're also not decisions that are made by just one person at IBM, and Steve is just the messenger in this case. He certainly isn't deserving of, and I certainly would not rain my wrath down upon Steve specifically. If you ever get a chance to meet Steve Legg, be sure to shake his hand and thank him for SVC. ;)

Phil on Stuff

Saturday, March 27, 2010

The nVidia GTX480 - and why it's a 400W+ piece of junk

Wednesday, March 3, 2010

Why I Hate "Good Enough"

Tuesday, March 2, 2010

IBM Storage UK Has Codified Stupidity

Blog Archive

About Me

Followers