Phil on Stuff: 2010

Wednesday, June 9, 2010

You're looking for my latest posts...

They've been moved over to my private host at:
http://rootwyrm.us.to

Monday, April 26, 2010

Wherein I review my new Synology DS410

Important Disclaimer! I actually didn't pay for this NAS. Yes, that's right, free stuff! But not how you think - I won it in a random drawing run by Synology on Twitter. Those of you who think this might color things; sorry, nope, I was going to be buying a DS410 anyways. So there were no strings attached with winning it, no promised reviews, I just happened to have a lucky day when they drew the winner. I was planning to tear into it and blog as soon as I bought it anyway.

So with that out of the way, let's dive in, shall we? Starting with the basics, here's a link to the product page for the Synology DS410 NAS. It's got 4 SATA hot swap bays that accept 2.5" or 3.5" disks, a single GigE connection, two USB ports, and an eSATA port for expansion. Protocols include CIFS, AFP, NFS, FTP, and more. It fits into a very small space - just over 7" high, about 9" deep, and around 6.25" wide. If you said "dang, that's tiny," then you're absolutely spot in. Finding space for the DS410 will not be a problem. What makes this little box all the more impressive is the eSATA port. Run out of disks on your DS410? Just attach an RX4 or DX510 expansion unit. (Or potentially any expander compatible with a Silicon Image 3531 PCIe1x SATA controller.) If you're looking for a totally thorough review of every single featurette on it, sorry, this is a practical review. Which means basically I put it through it's paces doing what I need and what most folks will demand of it.

My DS410 came in a retail box, which some folks would call boring, and which I call "very tasteful." A Synology branded cardboard box with a bright green sticker on the side declaring the NAS model as a DS410, with a brief but excellent list of the contents, supported applications, and the hardware inside. Of course, the DS410 by default comes with no disks, and mine was default. So I made a quick trip to Amazon.com for disks from the Synology compatibility list.(Linked because their compatibility list is the best out there.)
Everything you need to get rolling is in the box; the Quick Start CD, a quick start guide in a bunch of languages I don't speak (and English, which was a bit hard to find on it,) a standard power cable, the large power brick, all the screws you'll ever need, and a high quality Ethernet cable. When I say high quality, I do mean high quality Cat5e - one of the better made cables I've seen in a while. I'm not entirely happy about the large external brick, but it's needed to keep the chassis size so tiny. It's not an unattractive brick, and it doesn't get overly hot, so I have to admit I'd rather have it than a bigger chassis.
This part is super important. I opted for 3 x Samsung HD154UI 1.5 Terabyte 5400RPM SATA disks. Seriously folks; I bought only three slow 5400RPM disks. When I get to the performance part, you'll understand why this part is so important.

Diving into the gory technical details, since installation lets you do that, it's pretty obvious what makes the DS410 one of my favorites and what gives it it's phenomenal performance. We start with a Freescale MPC8533 processor, which isn't too unusual but a touch surprising. (Marvell's 88F5281 SOCs tend to be more popular for some reason.) But that's where the average ends. Making the network a strong performer is the Intel i82574L Gigabit Ethernet controller, one of my favorite parts in the whole world - corners weren't cut here. Part of the reason I love the i82574L is because it does internal TCP Checksum Offload and any MTU from 1000 through Jumbo with VLAN. (For my purposes right now, it's best to keep the MTU at 1500/1536.) Internal SATA is provided by a Marvell 88SX7042 PCIe4x controller, giving us a 1:1 PCIe lane to disk ratio excluding expansion. Expansion disks via eSATA get their own lane worth 250MB/s to themselves. The overall fit and finish of the motherboard and interior is well above average, which may be why you're all but encouraged to take off the main case when you go to install disks. Every component shows very deliberate care in selection and placement.

So how about installing the DS410 after taking it apart? Well, we've covered my 3x 1.5TB 5400RPM disks. Installing them was so easy, anyone could do it. Remove the slip-locked trays (just pull), place the drive on it, line up four screw holes, install the included screws, and slide the drive in. Locking is accomplished by raised areas on the bridges which fit perfectly into the metal cage, making locking surprisingly solid. You can't remove disks by accident. Getting access to the disks is ridiculously easy - four thumb screws on the back panel. Smart design decision that needs pointed out - the security lock slot goes through the interior metal framing, so it also locks access to the disks. The power connector from the brick is a tab-locking 4 pin affair, with a power cable lock included in the screws to ensure you don't accidentally yank it out. The fit of the power connector was so nice that I'd be more worried about breaking the motherboard before pulling out the power. Speaking of power, consumption is impressively low as well, along with fan noise. Temperatures were well within acceptable ranges, and the two 60mm Sunon fans (Sunon being one of my favorite manufacturers, I'll add) produce only the slightest fan whine. It's quiet enough that most folks will probably never hear them.

Initial setup of the DS410 is very straightforward and easy. It can be done on Windows, OS X and Linux or FreeBSD. I only tested Windows. Lesson one; your DS410 does not self load firmware onto disks! So the first step of initial setup is to use the Synology DS Assistant to locate your DS410. Network defaults for first time use are DHCP. Then you load the firmware - either from CD or the latest DSM 2.3 firmware from the web - onto the disks. Steps are pretty much initialize the disks (different from creating a volume,) configure the network, set the administrator password, set the initial time, aaaaand you're off and running. For my DS410, the whole process took about 15 minutes including the reboots. I elected to load the firmware from CD (currently shipping 2.2) so I could test out upgrading the firmware to current (2.3.) Important note here, DS Assistant doesn't need to be installed to your system - it runs off the CD. Brilliant move on Synology's part there, definitely.

Once you've finished with DS Assistant's initial setup, you actually don't need DS Assistant any more. You can elect to keep it around if you want, but I didn't. It's a bit limited in what it can and can't do - you can't really manage the DS from it, and the monitoring functions are duplicated in the web interface.
I'll spare everyone screenshot spam, the excellent Synology Live Demo will give you a better feel for the UI than I can. My take on the web interface? Excellent. Why? Because it just worked in every browser I tested it with (Firefox, IE and Chrome.) It was self explanatory. It was short and to the point. Organization is excellent; I didn't spend any time hunting for functions or settings at any point. I was able to configure it to enable NFS, sync time to my NTP server, disable unused services, and turn on DLNA functions without once needing to refer to the manual. And everything just worked. Which brings us to upgrading the firmware - it's remarkably painless, but I didn't have a volume configured prior to doing so, which probably contributed. I did it through the web interface, and it took about 5 minutes - most of that time spent rebooting. Gripe number one; the Synology DS410 is very slow to boot. It can take as long as 5 minutes to come online with no volumes or shares and a minimum of services enabled. But this is pretty minor - once it's on, you leave it running. Well, I do - you might want to use scheduled power on and off. Yep, the DS410 can do that. I just don't use it; it's connected to 24x7 operating servers.
The only real extra feature I've used is the Synology Download Station software, which is a BitTorrent and eMule client on the NAS itself. I performed testing using FreeBSD 7.3-RELEASE and 8.0-RELEASE torrents which are well seeded. I was not displeased; usage of the DownloadStation is simple and straightforward, download speeds had no trouble reaching the limits of my Internet connection, and with 6 torrents downloading the overall performance of the DS410 was virtually unaffected. That's downright impressive - most others I've seen tend to slow down significantly under that sort of load.

Volume creation forces you to use the wizard, which surprisingly, I don't mind. The wizard gives you sufficient control - select your disks, select "Standard" which is Synology Hybrid Raid (compare to BeyondRAID, RAID-X2, etc.) or "Custom" which offers 0, 1, 5, 6, 10 depending on your disk count. I did my performance testing using Synology Hybrid Raid and RAID5, since I have three slow disks installed. This is where I ran into my first real gripe about the DS410 - it only does iSCSI block mode with dedicated disks. That means if you want to use Windows 7's iSCSI to connect, you have to dedicate specific disks to it. Since I can use NFS instead, I didn't test iSCSI or performance with iSCSI. I didn't even test if there was a way to do iSCSI for a RAID volume.
Volume creation is surprisingly fast if you don't select scanning for bad blocks. The Synology Hybrid RAID volume took 15 minutes to create, and the RAID5 volume took 25 minutes to create. Since I'd already had to do surface tests on the disks (thanks a lot, Amazon.com shipping department,) a surface scan was redundant. I recommend you do it if you aren't going to be deleting and recreating the volume six plus times. It will add not minutes, but hours to your volume builds but exact times will vary by disk sizes always.

Shared folder creation is straight forward, simple, and easy. There are some annoying limitations to share management that I'd like to go over. First and foremost, it's an all or nothing thing. If you create a share, it's accessible via CIFS, AppleTalk, Bonjour, and NFS. You can control NFS permissions individually on a per-share basis, but all shares export as /volume1/ShareName rather than /ShareName. Which is another minor gripe - I'd like to turn off the volume prefix when there's only one volume. However, NFS permissions are great - you can create individual users on the DS410, and configure root quashing on a per-share basis. So I can map client mounting as root to be user prj on the NAS. I suspect the Windows permissions are better if you use ADS integration (yep, it's got that too!) but I don't have an ADS to test that function with yet.
You can also enforce quotas on the DS410, but it's on a per-user basis and applies to the volume and not the individual shares. So I can't set users up with a quota of 10GB for their documents but no quota for every other share unless I create two different users. Again, ADS integration might cause different behavior, but I can't test. However, I can set application permissions on a per user basis - that applies to FTP, FileStation, Audio Station, Download Station and Surveillance Station. So I can control who has access to various applications without them being all or nothing or even requiring ADS integration or similar. That's a definite plus from a security standpoint. In fact, over all the security features of the DS410 are pretty thorough and well thought out. There's even an SSH interface accessible to users in DSM 2.3. By the way, if you have an Amazon S3 account, the DS410 can back up to that. It can also connect to certain IP enabled UPSes, but I don't have a non-serial UPS currently, so I wasn't able to test that. I do recommend it, because on power failure, the DS410 will shut down cleanly then. All in all, I've got to admit I'm very impressed with the management despite it's limitations.

I'll only cover the included Synology Data Replicator 3 software very briefly, because I don't really use it. I did load it up to test it. If you're expecting high performance backups, this is probably not the software for you - backups were very slow, because of the high file count. However, they worked. Restores? Also worked without fault. Default settings were very reasonable and effective for almost any system. Configuration is vast, and some users might get lost in advanced settings, but file selection is easy to use and very effective. In short, Synology Data Replicator 3 just works. No complaints, no problems - it's probably the best bundled free backup software I've ever tested.

So you're probably looking for detailed performance graphs and picking apart every detail here, knowing me. Sorry folks - this is a practical review. I'm going to talk to you about what a typical user is going to see and like or dislike in this product. Most users are not going to care about the minutae of block sizes and whatnot. This is about how it behaves in typical usage conditions in a typical environment when pushed near it's limits.
My testbed was my workstation (75MB/s write 90MB/s read), an old laptop, a single SATA disk desktop and an ancient piece of junk used to generally just piss things off on the network. The same share was used, and the volume was deleted and recreated between tests.
So without further ado, let's have some real world, real situation performance numbers:

Test 1, copying around 1GB-4GB CD and DVD images less than 5 files at a time.
Synology Hybrid RAID gave me a peak of 75MB/s with an average of 65MB/s read and 40MB/s write to the NAS; I was limited to doing this with the single workstation for the large reads. (It's complicated.) These numbers held fast when I added the laptop to the mix at the same time for writes. RAID5 did a little better, peaking at 105MB/s before evening off at 67MB/s and holding it. Writes were completely unchanged at 40MB/s.
Test 2, one freaking huge DVD image!
For test 2, I created a 14.2GB single file by imaging a Mass Effect 2 DVD to my workstation's RAID1 disks. Then I copied it to and back from the DS410. Yes, 14.2GB in one file. Synology Hybrid RAID wrote the file to NAS at 40MB/s stable, but reads were less impressive - the first 7GB went by at 67MB/s, but then it dropped to 60MB/s and stayed there. RAID5 did a little better at the same 40MB/s write, but exhibited the same issue - the first 7GB went by at 70MB/s, but then it dropped to the same 60MB/s. So there seems to be a problem reading copying extremely large files from the DS410. This might affect backups, presuming your target disk is that fast. Tests were the same when repeated using NFS instead.
Test 3, copying a photo collection to a share!
This collection isn't too impressive. About 1000 files at 2-4MB per file, totaling around 6GB, copying to and from my workstation. This is where the DS410 showed it's first weakness. Synology Hybrid RAID managed a respectable 20-25MB/s of write, but backed it up with a solid 30MB/s read despite the large file count in the share. RAID5 didn't do much better, write performance increased to 24-30MB/s but read performance remained the same at around 30MB/s. So it's likely that the issue was count rather than size. So if you're working with huge numbers of files, the DS410 may not be the best option for you. From my testing, I think it's probably a bad idea if you're routinely copying large file counts around with it.
Test 4, watching videos and playing music. Lots of both all at the same time.
I'm going to be blunt; you're not going to do tests 1 through 3 on a daily basis, more than likely. You're going to copy DVDs to your DS410 and watch them. That's a very important function! So to test this, I created two network shares - one for music and one for video. I copied a bunch of MP3s and several different videos to the DS410, and fired up playback. I had 4 MPEG4 streams and 6 MP3 streams running in both Synology Hybrid RAID and RAID5 without so much as a hiccup. Systems were able to fast forward, rewind and jump with only slightly longer than expected stalls. Oh, and yes, this included NFS mounting which had no effect on performance at all.
Test 5, I put a Windows XP VM on the DS410 via CIFS!
I created a share, loaded a bunch of stuff on other shares, then copied a VMware Player Windows XP virtual machine image to the DS410. I mapped the share containing it as a network drive and fired it up. Well, okay, I defragmented the disk first. Ignoring the stern warnings about NAS offering reduced disk performance, I proceeded to power on the machine and compared overall behavior to local RAID1. In a nutshell? What reduced performance? Running my VM from the DS410 via CIFS didn't offer any significant performance differences; it wasn't noticeably faster, but much more importantly it wasn't any slower than local disk. Which I suppose is a statement that my local disk performance sucks, but remember, this is a RAID5 with 3 5400RPM disks! That it was equal to a local 7200RPM RAID1 is stunning.
Test 6, does it work with my Xbox 360?
A simple yes/no question. The answer is obviously yes - connected with no problems, played music, played video, no problems at all.

Now the numbers that REALLY matter: what it costs. If you're expecting high end performance out of this NAS after seeing my numbers, you're probably expecting the DS410 to sit in the higher price range, around the Drobo FS or ReadyNAS NVX Pioneer Edition, or from $600 on up.
You'd be dead wrong. The Synology DS410 retails for about $500 without disks. That puts it near the bottom of the price range, but packing performance and features at the top of the range.

So let's talk summary here. What do I think of my Synology DS410? I love it. I was going to buy one anyway, and I got lucky winning a random drawing instead! I'm not going to object to free stuff, but it certainly didn't color this review at all. The DS410 definitely has some drawbacks, but the positives decidedly outweigh them. Looking at it from a cost-benefit standpoint, as I do practically everything I own, I can honestly say the DS410 gave me the best bang for my buck at retail price.

Synology DS410 4 disk NAS - about $500 retail
Pros:

Well thought out design and component selection with great attention to details.
Easy to work on, and simple enough that even novice users should be able to do it.
Excellent overall performance in RAID5 and Synology Hybrid RAID.
Great disk performance, even with 5400RPM SATA.
The only eSATA expandable 4-bay NAS there is. No, seriously.
Well thought out UI, easy for new users to manage and detailed enough for pros like me.
Will happily and easily blow the doors off any USB attached disk you own.
Software and features usually only found on much higher priced NASes included and supported.
NFS configuration and security is well thought out and well implemented.
Works with Windows 7, works with my Xbox 360, and likely will work with any DLNA device you own.
Incredibly and impossible large library of supported PHP based applications - WordPress! phpBB! Drupal! Shame I maintain my own web server. I may yet abuse this!
Lots of backup software compatibility. It even works with Apple Time Machine, EMC Retrospect and BackupExec.
Included software is not awful junk. In fact, it's downright good software!
Huge community of users, many power users, including folks developing more applications for it and modifying Synology NASes in some very interesting ways.
Everything I did, I did without asking Support or @Synology any questions other than verifying disk compatibility. And when I did, they got back to me in less than 30 minutes.

Cons:

Seems to have problems if you throw a thousand files at it at a time.
Quotas are only per-user, not per-share, so watch out!
Windows permissions could be better, but this might be there with ADS integration.
NFS users must be managed at the DS410; no NIS/YP integration or ADS integration I could find.
Compatibility list isn't always up to date for devices other than drives.
Single Gigabit Ethernet; I think with an expander, it could easily push two.
LEDs are a touch bright; you wouldn't want this in your bedroom while you're trying to sleep.
Synology isn't found on the VMware Hardware Compatibility List.

As you can see, there are far more pros than cons overall, and even the cons are pretty minor. For most home users, they'll only need to add one or two users, put them in the default group, and they're ready to roll. For small businesses, well, I'm afraid you'll have to check the Synology Forums for folks with ADS integration experience. I'm still waiting on a fix from VMware for my ESXi box, so I wasn't able to test compatibility there. However, there's an entire forum category for HyperVisors and there's a wealth of good information there. Every indication is that the DS410 will work just fine with almost anything.

So, if you've been sitting on the fence about a NAS for home or your home lab, you can come down now and get yourself a DS410. This little box took everything I threw at it and then some, and kept asking for more. Not only that, but it hits well above its weight class in terms of performance, features, and quality. In other words, the Synology DS410 not only gets my coveted Stamp of Approval(TM) but earns my recommendation for almost every user out there. So go on - now you know you want one!

Saturday, March 27, 2010

The nVidia GTX480 - and why it's a 400W+ piece of junk

Okay, I did a really bad job of explaining why the GTX480 is a 480W+ part and why a 250W TDP isn’t what it puts out in terms of heat on Twitter yesterday. So, let’s try it again, with the math backing it up.

First of all; standalone GPUs are worthless. Claiming a GPU’s wattage by itself is like telling me how many calories are in the pepperoni – but not the rest of the pizza. The same with putting the numbers together – I could just have the pepperoni, but then it’s not a pepperoni pizza now is it? So how are you going to play games on just a video card? Well, you aren’t. So rating just by the GPU is only useful when sizing a system, and even then the numbers usually end up heavily fudged.

So let’s take the latest part paper-launched (won’t be available to buy until April) by nVidia, the GTX480. TDP of 250W – for the GF100, not the card. Or maybe it’s the other way around? Your guess is as good as mine, but based on data I’ve seen, 250W TDP for the card is probably somewhere around 25-50W on the conservative side, estimating for heat losses. With a TJmax of 105C and typical operating temperature of over 80C at the die, you’re talking about massive efficiency loss from temperature. Heat reduces efficiency, especially in electronics, so when you’re running voltage regulators managing 250W anywhere close to the top of their rated operating temperature? You start incurring some pretty nasty losses.

HardOCP was too kind in their power tests, because they utilized a benchmark called FurMark. FurMark runs almost exclusively on GPU with very low CPU loading. This is to prevent CPU binding; but it also allows modern Intel CPUs to go into lower power states. This is NOT representative of what you’ll see while gaming at all. Games stress GPU and CPU, pushing both towards max power draw and TDP. In fact, most modern games will get a Core i7 920 pretty near 130W draw.

So, let’s call it 125W for the CPU, 35W for the motherboard, 50W for a pair of SATA disks, and 250W for the GTX480. 125+35+50+250 = 460W. This number is particularly amusing to me, as some years ago, the specially built WTX power supply for Dual Athlon MP boards with AGPPro150 slots produced exactly that number. It also makes it a 400W+ part, because even if you switch to a Socket 1156 part at 95W, you’re still over 400W. AMD? Still over 400W. There is no way to build a usable system around a GTX480 with 90% load at 400W or less. That means 80 Plus certified power supplies most likely won’t help you till 600W to 800W absolute best case (50% and 80% load.)

But wait, doesn’t that mean you only need a 500W power supply? NOPE! Not even remotely close. That’s the “running estimate” – but for startup we absolutely have to rate by maximum draw plus 5 (the plus 5 is rule of thumb.) So that’s 135+55+55+255 or 500W ignoring fans. We have to add another 15W for fans, that’s 515W. Oh, and that’s what the individual devices are drawing – that’s not a real number. It’s real in the sense that it’s the minimum startup wattage, but it doesn’t account for various losses. That 1% loss pushes it to 521W of DC supply required to start up.

We have to adjust and base off our actual efficiency versus wattage to compensate for typical AC-DC losses and startup draw. That gives us an actual need of somewhere north of 600W. Otherwise, we’re going to just pop the power supply any time we fire up Modern Warfare 2 or Battlefield. That presents… a bit of a problem. Not to mention the demonstration that the GTX480 basically goes from its idle wattage of “only” 47W (total system draw of 260W idle for SLI configurations!) and jumps over 50W just to open a webpage, and we’ve got a REAL winner here folks. Yep, and by the way, those numbers are extremely conservative and don’t leave any overhead at all. That’s what you’re going to see from the wall while gaming – 600W and higher. Oh, and don’t install any additional hard drives, attach USB devices, etcetera. In fact, ignore those numbers and go with HardOCP’s recommendation of minimum 700W for a single card.

Now, to be entirely fair, we need to establish a comparison. We’ll use the card that the GTX480 is supposed to “kill,” the AMD/ATI Radeon HD5870. The HD5870 has a maximum board draw of 188W. HardOCP found that the HD5870 system drew 367W at the wall which gives us an actual DC load of 320W. 320-188 gives us only 132W for the remainder of the components. So we’ll just call the HD5870 at 200W of draw after losses and everything, giving us a whopping 120W for a mid-range desktop board, heavily overclocked i7 920, and a SATA hard drive. So take your pick here, folks. Either these numbers at 200W are right, or the HD5870 is actually maxing out its DC draw at somewhere around 130W. Personally, I don’t have a hard time believing everything else at a combined 167W.

To be fairer, we have to do the same power math we did for the GTX480 to establish our power for startup – 135+55+55+193 = 438W DC for startup with a Core i7 1366. But wait! What happens if we switch it to a Core i5 or i7 Socket 1156, which is a TDP of 95W? That gives us 100+55+55+193 = 403W, and we’re running a 20W margin of error on both ATI and nVidia configurations. With that 20W margin of error plus 15W for fans, the ATI still ends up below 430W. In other words, if you didn’t mind having little headroom and running the PSU pretty hard, an ATI HD5870 can easily make do with a good quality 500W unit which will see a maximum draw from the wall of somewhere closer to 495W with everything at its absolute limit.

So! If we go with everything else at a combined 167W, let’s run the GTX480’s 480W number. Real DC draw is around 418W. We subtract the “everything else” category of 167W and get 251W in free air at a temperature of 93C. The free air part is very important, and we’ll get to that in just a little bit here.

Now here’s an exceptionally important point – HardOCP witnessed GTX480’s exceeding 900W at the wall in 2-card SLI when CPUs barely added into the mix, with an 87% efficient power supply. If we give them the full benefit of the doubt and say 250W for each GTX480, that leaves over 400W for the rest of the system. Let’s make our correction; 87% of wall is actual DC – that’s 783W DC side at 900W. We’ve already established that every other component combined is roughly 167W of draw. We’ll be exceedingly generous and jack those up to 200W. Notice how the numbers still don’t add up there, at all? Remember, it was OVER 900W at the wall and on an 87% efficient power supply at that! Seriously. Let me spell it out for you.

783 – 200 = 583+ / 2 = 291+ per card in SLI.

That means in SLI at 92C with fans screaming, those cards are actually drawing nearly or over 300W of DC, which translates to somewhere north of 650W at the wall. There’s some HUGE power losses going on there from heat, no doubt, since we’re talking about cherry picked cards from nVidia with non-release BIOS. These are, in other words, not actually representative of what AIB partners will be putting out. AIB partners will likely use lower cost voltage regulation and support components to try and handle the costs that are already non-competitive. If we presume that the card is a 250W combined part but gets 90% efficiency from supplied power, we get right around 270W. And as we’ve already covered, just the card is useless. Oh, and three way SLI? Dream on. At 2-way SLI, you’re pushing 1000W at the wall. There’s one 1500W PSU available on the market, it requires your outlet be wired for a 20A breaker, and it’s going to set you back $400+. Sorry folks, 1200W won't cut it - 900+250+ = 1150+. Oh, and then there's that little problem where your noise level is actually 64dBA for a single card and over 70dBA for two. Remember, dBA is logarithmic, so 64.1 to 70.2 is more than double. These things are dangerously loud and can make you deaf.

Now let’s complicate matters properly; HardOCP did all their tests on a bench in free air. This is a huge deal, because free air means that it’s not in an enclosed chassis. It has a continuous supply of cold air feeding it and completely unrestricted airflow from five directions. PCB and ambient heat is also indirectly radiated to open air independent of the fan movement. All this combined lowers the operating temperature substantially when compared to a card installed in a chassis. In other words, the 93C operating temperature is very much on the low side. This is why nVidia was requiring manufacturing partners to certify their chassis beforehand. When you put these cards into a a chassis, they’re suddenly faced with restricted air flow, the loss of ambient cooling, and the addition of over 100W of ambient heat from CPU, motherboard, hard drives, etcetera. Very very few destktop chassis are 100% thermally efficient – that being, it rejects its entire thermal load and maintains the interior temperature at intake air temperature. I have built and worked on some of the most efficient there is, and typical users are going to have chassis that with a 200W TDP video card, is going to be no less than 15C above exterior ambient (or deafeningly loud.)

Now we have a real problem, because that means we’re running at the ragged edge as is. If we call exterior ambient at 74F that gives us an ambient of 23C. If we call it 15C, that gives us an interior ambient of 37C or about 97F. In free air testing at HardOCP, 74F ambient isn’t an unreasonable estimate and is actually probably high. So end users will be applying an ambient temperature 15C higher than the temperatures that let a GTX480 run at “only” 93C. With the loss of ambient thermal radiation, and airflow restriction from components and the chassis, plus an additional 150W+ of added thermal load applied unevenly to all fans… well, you can bet that a GTX480 will never be quiet, and it will be screaming as it tries to maintain the die at 100C or below. This is why I do NOT like free air noise testing. Yes, it tells and shows you just how loud the fan is, but only in free air. Typical users will have these parts in a chassis, which can and will have significant effects on the temperature and cause the fans to spend more time at higher speeds. In fact, it will affect all fans in a modern chassis.

I don’t particularly have a horse in this fight other than my standard policy of “if it doesn’t work, if it’s not the better part, then I don’t want it.” The GF100 fails both of those, miserably and with great gusto. The performance numbers aren’t compelling at the price point, even if ATI doesn’t cut prices on the 5800 family parts. The power draw, heat, and noise generated add up to something I could even consider putting in a desktop system. Nothing short of watercooling is going to get that noise and temperature under control. Even the Arctic Cooling HD5870 part that they rate to 250W dissipation can’t do it (in part because it doesn’t exhaust outside the chassis, but onto the card instead.)

Not to mention the fact that they’re putting a 250W part totally dependent on game developers playing ball for its performance, up against a 188W part that in most situations offers equal or better performance. To justify a 250W, $499 part over a 188W, $410 part you’re talking about around a 30% performance jump needed. But nVidia delivers somewhere around maybe 5% except in tests written specifically for the card, or a 197W part. It’s only worse when you stack up the HD5850 at 151W versus the GTX470 at 215W – nevermind the fact that it’s a $350 part versus a $280 part. Again, same thing, 30% jump needed to justify the price and power, and it’s just not there.

So with all these numbers and all this math right there, why don’t the review sites point this out? Simple; because they don’t want to piss off the people who feed them hardware. They have to leave doing this math as an exercise to the reader, because pointing out design failures like this in detail will lose them access to the hardware. Especially with nVidia – they’ve deliberately cut off and retaliated against sites that refused to lie for nVidia’s benefit.

So, there you have it. The GTX480 is a 400W+ card and the 250W draw is debunked. Where’s all the power going? Ask nVidia – they’re the ones who’ve delayed GF100 multiple times and been having issues with leaky transistors and having to jack up the voltages. I’m not an electrical engineer, but I can do basic math, and that’s all you need to see that the GTX480 definitely goes in the design failure column along with the NV30 series (AKA GeFarce FX AKA DustBuster.) This isn’t a card I could recommend, much less sell. And hopefully you’ve learned a lot more about desktop system design while I ripped it apart.

Wednesday, March 3, 2010

Why I Hate "Good Enough"

I really truly hate the “Good Enough” mentality that’s become so pervasive in IT these days. It’s not because I think everything should be five-nines – that’s a common misconception of my attitudes and thoughts. Far from it – five-nines is prohibitively expensive and downright absurd for almost everyone. (Which is also why I dislike anything claiming five-nines based on not going down in 12 months. Seriously, that’s not five-nines.) More simply, if “Good Enough” was really the ultimate level in reliability, then why does any business bother with Disaster Recovery?

Here’s how Good Enough is implemented most commonly these days:

The Chances Of This Going Down Are Too Small To Bother With Planning For
We Don’t Think This Will Go Down So We Won’t Plan For If It Does
We’re PRETTY SURE This Won’t Go Down But We Have Support on Speed Dial
We Clustered This So We Totally Know It Won't Go Down
If Something Goes Wrong, Call Support And Hope They Know Why

Here’s how Good Enough is implemented by yours truly:

Chances of Failure are Very Very Low, BUT If It Dies, We Do This
We Don’t Believe This Can Fail, BUT If It Does, We Do This
Confidence Is Moderate, BUT We Have A Plan For Failures
It’s Clustered, BUT If The Cluster Has Problems, We Do This
If We Have A Problem, Involve Everyone And Find The Root Cause By Any Means Necessary

Notice the difference? I do something very different – I make the presumption of failure. That doesn’t mean everything’s crap, even though much of it these days is. It presumes that at some point in time, for some reason, failure will occur. I don’t know when, I don’t know why, and I may not even know how. But I intend to and absolutely require that there be a plan in place for dealing with that failure. Things like maintenance are planned, but you would probably be shocked at how many organizations plan their maintenance poorly by my standards. And my standards aren’t that unreasonably high, either.

I require a plan for the maintenance, a plan to back out if things should go wrong, those you’ll find everywhere. But I also require a plan for restoring function if a back out should fail, and a plan for forcing ahead if a back out is impossible. Why do I require these things? Because what happens if the back out fails? I’ve had it happen, and it’s not pretty. And what happens when you can’t back out changes? I’ve seen that plenty of times – most organizations actually take the stance that if it can’t be backed out, don’t bother with a back out plan, just say it has to go forward. Okay, so what happens when the upgrade fails? There’s no plan in place, no way to go back, you’re caught in a lurch.

I’ve been blessed, or cursed depending who you ask, to see many kinds of failures in many situations. Everything from a single byte of corruption resulting in a failed firmware update to yours truly accidentally deleting the wrong multi-terabyte database. (Hey, think of how many coworkers and employees you know that would actually admit to it, as opposed to just restore from backups and pretend it never happened.) As I’ve progressed in my career, I’ve learned a lot about failures, and a lot about how to manage them and mitigate them. Yet somehow this knowledge seems to just be absent or downright missing at a variety of levels.

I wish I had some good answers as to how we can inject this back into the IT operations and business operations processes. Unfortunately, I don’t, other than pointing it out here. Seriously folks, think about this. What’s your procedure when a round of maintenance goes awry? Chances are your first and only answer is “call support.” Calling support is all well and good, and an important step, but it shouldn’t be your only step. It’s also not a step you should be injecting between “perform upgrade” and “maintenance complete.” In other words, your process flow chart shouldn’t be a series of straight lines, and they shouldn’t all be pointing down or right.

Let’s talk example. This is a real situation I’ve been through, with details changed. I’m not going to name names, because absolutely nobody in this situation looks good by any measure.

Maintenance was scheduled on a development system for Friday afternoon. This maintenance was operating system patches and a scheduled reboot as part of the patching process. The process had been done many times before with no problems, so there was no established plan for backing out patches. Install the bundle, reboot, done.

After installing the bundle, the system was rebooted and refused to go to multiuser, complaining of problems with system libraries. Upon examination of the logs, it was decided that it would be too much hassle, and rather than attempt to repair, a quick script would be written to back out the patches. The script failed to back out several individual patches, because they could not be backed out. This was accepted as “just how it is” and the system was rebooted again.

Now the system refused to go past single user, and critical services could not start. Files were determined to be missing, and an attempt was made to install them from the OS media. This failed because there were incompatible patches on the system that could not be backed out. A SEV1 call was placed to the operating system vendor’s support.

Now, let’s start with our first failure – the presumption that just because it worked before, it would work again. Then it’s compounded by not having any real plan – install, reboot, done is not a plan. Further complicating it, a back out attempt was ad-libbed, without understanding that some patches couldn’t be backed out. It only gets worse when this is accepted as “just normal” without any explanation or understanding of why or what. It’s likely at this point, dependent patches were removed because they could be backed out despite the patches that COULDN’T be backed out being dependent on them. This is a fatal presumption of “the vendor would never do something that stupid.” Sorry; every single vendor is that stupid at one point or another, and they make mistakes just like everyone else.

So at this point, the entire process has become ad-libbed. Do we restore from tape? Back out more? Reattempt patching? Who knows! There’s no plan; we’re shooting from the hip. So now we’re on hold for support with a system that’s been down for hours, its 9PM on a Friday, and it has to be back online by 7AM Monday or it’ll throw off a multi-million dollar project. This primarily came about for the worst reason of all; “development” was treated as a sandbox where it was okay to do just about anything, despite it being very actively used for development work.
Ultimately, the vendor’s response made the problem even worse still: “oh, yeah, we know about this problem. You have to restore from tape, and if that doesn’t work, you have to reinstall.” So the system was restored from tape, with limited success. Reinstallation of the system wasn’t an option, because of the way things had been configured and had to be built. But leaving restoring from tape and reinstalling the system as the only repair methods is what the operating system vendor considered to be a Good Enough answer for their Enterprise product.

Ultimately, the system continued to have problems and turned into a very expensive three month project performing a total rebuild of the system and all its environments, because everyone involved from management to the system administrators to the operating system vendor all said “that’s Good Enough.” It cost management a lot of respect, the system administrators a lot of time, the business a great deal of money, and the vendor lost the customer – probably forever.

So the next time somebody tells you something is Good Enough, don’t buy it. A Good Enough plan isn’t – and never will be Good Enough when it’s your business at stake. Good Enough doesn’t mean building the most reliable infrastructure you can then throwing up your hands and saying “that’s as good as we can get, oh well!” It means accepting that things will fail, things can fail, and that nothing will ever be perfect – then taking that knowledge and acceptance to build plans for that.

If you’ve planned for and built for the fact that failures are a when and never an if, and defined a process to work around and repair those failures, then hey, that’s Good Enough for me.

Tuesday, March 2, 2010

IBM Storage UK Has Codified Stupidity

cod·i·fy (k

-f

, k

tr.v. cod·i·fied, cod·i·fy·ing, cod·i·fies

1. To reduce to a code: codify laws.

2. To arrange or systematize.

Pay attention to number 2 there. Chris Mellor of The Register got some words from Steve Legg, IBM UK’s Chief Technology Officer for Storage.

These words made it quite clear that it there's an intent to codify stupidity within IBM Storage UK. He said simplify, but this is me, and I don’t like lies and obfuscation. What he actually meant is “collapse the offerings, and then make some patently ridiculous and arguably false statements to the press.” The word choices he made were exceptionally poor, but the choices made in "collapsing" are far worse.

And here comes the hatemail because me, Mister I-Love-SVC and I-Love-DS8K is calling IBM Storage “stupid” and “ridiculous” and thus I must now be a shill for $MostHatedVendor or whatever. Except I’m STILL not employed or representing anybody but myself. Seriously, if I was shilling, I would have built myself a Dragon 20w with dual 5970’s. Or I would have at least put 16GB in my ESXi box instead of 8GB.

Anyways, let’s be honest and start with the good. I like honest, and I like good. Who doesn’t? SONAS – forget IBM’s acronym of Scale-Out NAS. I demand they change the acronym to Seriously Ossum NAS. It’s a brilliant design in its overall simplicity, combined with absolutely ridiculous density. If anyone’s going to get this right, it’s not Sun – I mean Oracle, it’s going to be IBM. They have the budget and resources. And SONAS delivers, if the order is NAS. I am a little dubious of some aspects of SONAS, but these are software issues and not hardware issues. Software issues should be able to be fixed without needing to forklift the hardware.

What software issues am I concerned about? SONAS is going up against not just Oracle, but NetApp, EMC, HP, Dell and so on inevitably. In that regard, it’s lacking in the snapshot to application integration NetApp and others have. At the price points IBM’s talking on SONAS? Integrating with applications for snapshots is pretty much expected. There are a lot of other software integration and capability questions that IBM has so far left unanswered (without NDA,) so it’s very much a wait and see. The hardware has the potential, it’s up to the software to execute. But at least they’ve solved the back end portion already with GPFS.

The good while being less than brilliant; “VDS.” This ‘offering’ is almost insulting to the capabilities of the IBM SVC. The VDS product cripples the SVC by chaining it to IBM’s low and midrange storage, the DS3k and DS5k. Look, you’re not likely to sell any business who’s had a DS5k another DS5k. The architecture is positively ancient, and is still incapable of anything beyond the most basic of maintenance being performed online. Any firmware maintenance absolutely requires hours of downtime. The DS3k doesn’t even attempt to fake online maintenance capabilities – it just can’t, and it’s not meant to.

But this is a channel play. Why? Beats me – IBM could certainly use more solutions as opposed to just products. My opinion is that it would be a lot smarter to keep VDS close to the chest, and offer it with DS3k, DS5k and DS8k. Seriously folks, the DS3k and DS5k can produce great performance numbers, but they have not been and will not be true enterprise arrays. You have a minimum 2 hours of downtime per year – that’s minimum, not typical – for mandatory firmware upgrades. Why? DS3k and DS5k require stopping all IO to do controller, ESM and disk firmware. So the SVC’s high availability ends up somewhat wasted here. Only the DS8k is on par with the SVC for high availability while servicing.

And the patently ridiculous and arguably false, otherwise known as codifying stupidity. I’m going to give you a quote, and you’re not going to believe it, but it’s a very real quote.

"XIV can reach up quite a long way and run parallel to the DS8000.” –Steve Legg, IBM UK Storage CTO

Yes, that’s Steve Legg of IBM UK saying that the XIV is the equal to the DS8000. Now Steve, the horse is out of the barn, and you can damn well believe I’m going to call IBM out on this load of manure. That statement has absolutely no basis in fact by IBM's own published case studies and reference sites, and even a cursory review of specifications between the two arrays reveal it to be obviously disingenuous at best.

But let’s have a refresher of those spec sheet contents, shall we?

XIV is comprised of 15 modules totaling 180 1TB 7200RPM SATA disks with 120GB of cache and over 7kAVA of power draw at idle and a peak of 8.5kAVA at 29000BTU/hr. The only RAID type is mirroring, reducing actual capacity to 79TB before snapshot – this is also the maximum capacity of the XIV, 79TB – it is not possible to span frames except to mirror them. You cannot grow past 79TB and there is no intent to move to 2TB disks in the next generation XIV hardware. Disk interface is 12xSATA over Gigabit Ethernet, changing to SATA over InfiniBand in the next hardware release (forklift upgrade required.) Protocols spoken are Fiber Channel 1/2/4Gbit and iSCSI over Gigabit Ethernet with a maximum number of 24 FC ports and 6 iSCSI ports, with host ports removed for Mirroring HA (the only HA method available.) Major component maintenance is limited and customers may perform absolutely no service on XIV whatsoever. And I do mean NONE; even a simple disk replacement must be performed by a specially trained CE. IBM shipped the 1000^th XIV in November of 2009.

DS8000 is now four generations old, comprised of the DS8100, DS8300, DS8300 Turbo and recently introduced DS8700. Based on the IBM POWER architecture as a controller and using custom ASICs, the DS8000 family doesn’t just hold but absolutely owns the SPC1 and SPC2 benchmarks. Two processor complexes provide from 32GB to 384GB of combined cache and NVS. The DS8700 ranges from 16 to 1024 disks using any combination of 73/146GB SSD, 146/300/450GB 15K RPM, and 1TB 7200RPM disks in packs of four or sixteen with a maximum capacity of 1024TB. RAID levels supported are 5, 6 and 10. Disk interface is FC-AL via multiple GX2 connected IO Complexes. The frame ranges from a single wide cabinet to 5 frames (base plus four expansions) with minimum power draw of 3.9kAVA base, 2.2kAVA per expansion and maximum of 7.8kAVA and 6.5kAVA respectively. The thermal min/max is 13400/26500BTU/hr and 7540/22200BTU/hr respectively. Protocols spoken are Fiber Channel 1/2/4Gbit and FICON 4Gbit with a maximum host port count of 128 in any combination of FC and FICON. Almost all major component maintenance can be performed without needing to shut down the DS8000, and all prior models can be field upgraded to the current DS8700 941/94E. Customers may opt to perform most DS8000 maintenance tasks themselves and some hardware repair, including disk replacement.

As you can see, these two systems are not even remotely similar or comparable. The absolute maximum disk IOPS an XIV is capable of, being as generous as we can be at 180 IOPS per disk, is 32,400 IOPS. The DS8700 using FC disks and the same 180 IOPS per disk as a conservative number, is capable of 184,320 IOPS. This is ignoring all buffering, caching and advanced queuing. The DS8700 is proven to be capable of well over 200,000 IOPS with a high number of hosts. IBM refuses to submit XIV to an audited benchmark and their most detailed case study with Gerber Scientific shows XIV only handling a total of 6 systems (claiming 26 LPARs, that's still ridiculously tiny) and using less than 50% of its available capacity.

For IBM to even insinuate that the XIV is “parallel” to even the DS8100 first generation hardware is to basically call their customers idiots; it is the same as telling MotorTrend that your 1985 Yugo 45 can keep pace with a 2004 Ferrari Enzo. It’s only true as long as they’re both doing 25MPH and you’re willfully ignoring everything other than the fact that they both can do 25MPH. Anybody who spends more than 10 seconds reviewing the specification sheets for these two systems or cars will immediately be able to tell that they are not in the same class. Yet IBM would very much like you to believe that their Yugo 45 is just as fast as their Ferrari Enzo. Perhaps a more apt comparison would be that Steve is currently telling you that IBM's Renault Twingo can totally hold at least as many people as their London Double Decker Bus.

Am I calling Steve Legg an idiot? Absolutely not. Steve just made an amazingly bad word choice. Steve Legg is a well respected guy, and not someone who's going to call you daft, especially not customers. But he’s basically said that IBM’s organizational stance is that customers aren't smart enough to spend a few moments reviewing a spec sheet, and seeing the obvious disparity between the two arrays. He’s saying that IBM believes customers are too stupid to see the inefficiency of the XIV as compared to its “green” claims, too stupid to see the raw horsepower of the DS8700, too stupid to tell the difference between 7200RPM and 15000RPM, too stupid to understand that 3.9+2.2/7.8+6.5 kAVA is more efficient than 7+7/8.5+8.5 kAVA. The problem with this is that the special XIV people will latch onto these words, yet again, and continue to use them while they do treat customers like idiots. (Those who claim they don't, I had them telling me to my face that the numbers they were putting up on the screen as gospel, didn't mean anything. Among other things.)

Yet again, this does not mean XIV does not meet some needs. What it does mean is that XIV is still not equal to nor does it offer performance comparable to the DS8000.

His statements show that IBM’s offerings have codified stupidity; “we now sell on the basis that customers are too stupid to read or question us.” When customers push back on the high cost of DS8000, just whip out the significantly cheaper and far less capable XIV without mentioning anything other than "it can run parallel to the DS8000!" Which only goes to further support my arguments that you should be questioning your vendor at length, demanding hands on testing, and refusing to take their word for it on any statements of suitability or performance. The choice is yours – you can challenge your vendor, or you can enjoy the challenge of finding new employment. And you should be really extra careful about what exactly you say to the press, especially when you have a fiefdom that doesn't answer to you itching to abuse it.

Update:
I'm sorry about the VERY poor wording on my own part, and I want to extend my sincerest apologies to Steve Legg if I caused any offense. (I should not be writing so late, obviously.) Steve is by all accounts a great guy, and I'm sure that it wasn't his intent to imply that customers are idiots. The problem is that he made a bad choice of words and phrasing, and that's how it came out. I'm quite positive he knows better, especially since IBM UK is the home of the SVC.
The problem is that's how the words went and how the offerings are now aligned, and what it says to me as a customer. But they're also not decisions that are made by just one person at IBM, and Steve is just the messenger in this case. He certainly isn't deserving of, and I certainly would not rain my wrath down upon Steve specifically. If you ever get a chance to meet Steve Legg, be sure to shake his hand and thank him for SVC. ;)

Tuesday, February 23, 2010

A Unix guy on the Xbox 360

Update: Rob Enderle responds over here! Thanks Rob! :)

Disclaimer; I have no idea if it's preferred as "Xbox360" or "Xbox 360" or whatever. I use a space.

So, some backgrounder. Rob Enderle says that Microsoft shouldn't be in Console Gaming. Greg Knieriemen agrees with him, Microsoft should not be selling hardware. John Obeto says that Microsoft HAD to enter console gaming. Jay Livens shares his own Thoughts on Microsoft and the Xbox as well. So of course, I have to chip in my two cents - especially given that one, I'm an Xbox 360 owner. Two, I'm a Zune owner. Three, I run Windows on the desktop and have for nearly 20 years.

First, let's look at what started this whole discussion: a graph showing where Microsoft gets their profits. You can find it right over here. People drew the conclusion that Microsoft's entertainment business is a drag on their profits, which looking at just that chart, would give that impression. But it's also not true, in my opinion.

Microsoft needed to jump in, because PC gaming isn't going anywhere - by both definitions.
I'm a PC gamer, yes. And let's be blunt, and let's be honest. It's not going anywhere, and it's doing it very quickly. And I very much mean in all senses of it. When was the last time you saw a truly innovative game only available for PC? PC gaming outside of MMOs is a pretty stagnant market for a variety of reasons, a few of which I'll cover in this post. But it's not dying either, contrary to what others might say. But with DRM that is increasingly customer-hostile (hello Ubisoft, looking at you! Hi SecuROM, you too.) and the plethora of problems that come from the fact that PCs are not a stable platform. In my personal experience, more users are turning to consoles for non-exclusive titles.
They want the game, they just want to buy and play, rather than wait for a "beta" patch with no permanent fix ever offered. See most specifically, the ordeal with The Saboteur from EA and developed by Pandemic Studios. PC users have been plagued by severe problems from day one and have yet to receive any support whatsoever. Players who purchased the Xbox 360 version have raved about the game and reported few problems.

The Xbox 360 gives Microsoft their one - and only - stable platform.
Ever looked at an EMC Interoperability chart? People talk a good game, but the fact is that PCs are just as bad as that, if not worse. This sound card may or may not work with this motherboard. These drivers aren't compatible with this game. To say nothing of game and OS interoperability issues, ranging from the "just isn't supported" to "this one impossible to find setting buried deep in the registry causes this game to crash constantly." To say nothing of the myriad ways one can merrily screw up settings on a Windows system - and most gamers usually do. Windows Mobile isn't even close to a stable platform - look at how many Windows Mobile devices there are, on how many completely different processors and architectures.
Xbox 360 has none of these problems. It's a true stable platform. Microsoft controls the OS, drivers, and UI software. User settings are limited. There's a bunch of accessories - all tested and known compatible. By maintaining this tight control and limiting adjustment, Microsoft gives themselves a way to control the overall experience, which brings me to my next point...

The Xbox 360 gives Microsoft a way to control the quality of the experience.
Face facts; Microsoft has no control over the systems people install Windows on. They can control the logo stickers, but that's it. If Joe Blow ships a poorly built system with junk parts running misconfigured Windows 7, users are going to complain about Windows 7 being utter crap - even though it's entirely Joe's fault.
Xbox doesn't have this problem, and probably never will. Microsoft isn't just preventing Joe Blow from making things miserable for end users, they're taking responsibility and ownership of any issues. This extends into the games themselves - Microsoft is the final arbiter of whether or not your game gets released on their system. If they have concerns about the game crashing every two hours, they can prevent release. If they have concerns about the quality of the game, they can prevent the release. And they've done this to games before.
This extends to supplemental products - like Windows Extender and Zune - and brings me to my next point...

The Xbox 360 gives Microsoft a core for their integration strategies.
You can argue Windows is at the core - but it's not. Windows "sort of" is at the core; Windows Extender, Zune, these depend on Windows but get driven to your Xbox 360. But here's the thing - which is more likely to already be in your media center, the Xbox 360 or an HTPC? The Xbox 360 becomes the core because everything is being driven to it, regardless of what it runs on. Rather than fighting the limitations of it, Microsoft has chosen to embrace the limitations and use those limitations to drive other products.
You buy an Xbox 360, you want to watch movies on it - so you now upgrade to Windows 7 so you can use Windows Extender. You like music and have your good stereo downstairs; Zune lets you play your collection on your Xbox 360 that's already there. This is what damages the argument that Xbox 360 is a losing proposition - this integrate and extend mentality allows Microsoft to upsell other products alongside the Xbox 360. Don't forget about the continually evolving Games for Windows LIVE interface as well.
Ultimately, it all ties together around what the Xbox started with the Live Marketplace; everything goes back to Live Marketplace.

You Can't Do That - Yet, But Soon.
Microsoft first released the Xbox in 2001. It was huge. The controllers were too big. Initial sales were disappointing and in some ways, downright depressing. They had to create a smaller controller to replace the almost unusable controllers. It was widely considered a flop and a failure. Microsoft quickly set to work on fixing the various issues, but they had one thing going for them - their online play system, Live. They also had a winning launch title - Halo. Project Gotham Racing also was a well received title at launch, but the system was repeatedly delayed and criticized for difficulties in developing for it.
If there is nothing else Microsoft is good at, they learn from their mistakes.
The Xbox 360 has had it's own mistakes. Most widely known is the Red Ring of Death. Microsoft owned up to the problem and handled that very poorly. They resolved it by performing a major revision of the hardware - which virtually ended the RRoD plague in one swoop. The Xbox interface has been marked much like the Xbox itself, by a process of continual improvement. From the original Xbox to the Xbox 360 to the Xbox 360 "NGE" interface, Microsoft has continually added features and fixed complaints. They listen to the users, and they act on it.

The Xbox 360 extends and embraces Microsoft's attempts to be the go-to place for developers, and succeeds.
Bet you didn't think I'd use that line ever in my life. SURPRISE! A good friend of mine is an independent game developer, and has console development knowledge. I've personally done research and some programming for the Cell BE which powers the PS3. Let's be very blunt and to the point: programming for Cell BE is a special level of hell. Seriously. It's abysmal, and pure misery. Developing for the PS3 is incredibly difficult, complicated, and expensive. You need the full console developer kit, which costs thousands of dollars, and if you want to publish? Well, you need to do the same dance the major publishers do with Sony, pretty much. It is an incredibly expensive and difficult proposition to develop for PS3, and almost none of your code will be reusable.
Now let's say you want to develop an Xbox 360 game.What resources do you need to buy and invest in? An Xbox 360 Elite and an XNA Content Creator account, oh, and Visual Studio - but that's actually optional. Congratulations, you are now fully equipped to develop for the Xbox 360. But what about publishing your game? Through Live, Microsoft has a solid reliable and well marketed distribution channel for independent developers. Independents can put up their latest creation, and start making money in short order. Microsoft is very active about promoting quality titles from XNA.
You want an example? No problem! Check out Xbox Live Indie Games. How easy is it to get your game up and making you money on Live Marketplace? Here's my friend J's Pendoku on Xbox Live - developed over a few weeks - and here's the free Flash version. Putting it on Marketplace was as easy as uploading and going through a few menus to publish, then waiting for approval, which usually takes around two weeks I believe. It's very easy. Which is what independent developers not only want, but need - lower barriers.

The Xbox 360 is not meant to be a profit powerhouse.
This is probably the most important point here. Since when has Microsoft ever thrown good money after bad for so long? The answer is; never. Microsoft has an internal and very secret set of goals for Xbox 360, and externally they seem to be rather pleased with them.
I'm going to wager a guess on what one of them is; be more popular than PS3, or at the minimum be perceived as such. In that respect, they're doing just fine. Take a look at this ArsTechnica Look at 2009 Console Sales - the Xbox 360 runs ahead of the PS3 almost all of 2009. 4.77 million units last year, while Nintento absolutely dominates the more casual market. But that's not the entire picture, oh no. Remember what I said about it being so easy to publish on Live Marketplace? Here's some sales statistics on Indie Games from GamerBytes. See for yourself; 160K copies sold of "A GAM3 W1TH Z0MB1ES" developed by James Silva. That's huge and more than some major publisher releases! This increases visibility, increases popularity, and encourages more independent developers.
It all translates into one thing; Microsoft has never really planned for Xbox 360 to make them billions of dollars in profit. Game consoles just don't. Instead, they planned for a specific goal and seem to be very happy with their results this far. They know as well as I do that there is no chance of Game or Entertainment segments becoming replacements for Windows or Office. Think about it rationally - how many offices do you know that put an Xbox 360 on everyone's desk? Now how many put a PC on everyone's desk. That's why Xbox 360 is tiny compared to Windows and Office. But that "tiny" number as perceived by others, appears to be making Microsoft very happy overall. If it wasn't, they'd change their strategy, and there's no sign of that. Instead they're continuing on their combination of extending and improving with things like Project Natal, going for a differentiation from the PS3 and maintaining a strong lead in Online.

So ultimately, I'm entirely behind Microsoft's entry into the gaming console market. Let's be blunt here - if you play any sort of "violent" game, Wii is not for you, period. Nintendo has the strictest content policies out there; so bad that they've blocked some games, and forced Dead Rising to be almost totally rewritten to meet Nintendo's content guidelines. Independents want nothing to do with Nintendo - getting a game approved is a nightmare process, and also requires a huge investment in hardware.
Sony has lead in innovation, and could be said to still have a substantial lead there in some areas. The problem remains that the PS3's initial release was a disaster in all ways, and Sony chose not to correct it aggressively. In fact, they sat on it and did nothing to deal with it. Their innovation has cost them titles and the cost of development has cost them exclusivity agreements. It's hard for publishers to make money on PS3 games, and indie developers haven't steered clear as much as been tossed over the high cost fence with a 'kick me' sign on their back.

Microsoft is the only company out there right now that is bringing multiple areas together, and doing a damn fine job of executing on it. Think about it - it has franchises like Halo, a violent FPS that caters to the 12-24 male segment. But they also have games like Lego Rock Band, which is decidedly a family title - especially given that Lego has strict content guidelines like Nintendo. Add to this, they have a whole raft of not just hardcore but casual and family friendly games from independent developers. They're winning the hearts and minds battle, because when those developers can (and do) make money from putting their games on Live Marketplace, they tend to become vocal supporters. The Xbox 360 provides them with a stable platform to ensure a quality experience that users may not be able to get on their PC for any number of reasons.
Microsoft absolutely needed to enter the console market, and now that they've been here a while, it's blatantly obvious that they not only intend to continue but deserve to continue. Say what you will, but the fact is that Microsoft isn't winning the hearts and minds battles in the console space purely on marketing muscle or dollars spent. They're winning it because they came to the game with an attitude that they were going to change the game, and deliver a seamless, solid experience. They've stuck to that aggressively positive attitude, and it's delivered not just for Microsoft, but for users as well.

MANDATED DISCLAIMER: I own a Microsoft Xbox 360, two Microsoft Zunes, a Zune Marketplace subscription, a couple Windows XP and 7 licenses - all at my own expense. I also will be buying a PS3 the day Gran Turismo 5 is released - sorry, Microsoft. Forza doesn't do it for me like Gran Turismo does. Also, a good friend is an XNA CC member, as mentioned above.
That said, these people STILL won't give me free stuff! What gives? ;)

Wednesday, February 17, 2010

Why I Hate Macs

Let the flame wars commence! Okay, first read the post, THEN flame, capice? That's better. I wrote this up ages ago, and just never bothered to post it, because I didn't feel like having people scream at me for a week straight. Well hey, I'm in a self-abusing mood.

So yeah. I hate Macs. As in absolutely loathe and refuse to own a current Mac, period. Why? I've got a whole list of reasons, but let's step back a moment. Remember when Macs shipped with SCSI disk and 68k processors, then later PowerPC 603's and 604's? I loved Macs back then. Owned two. Because they actually differentiated themselves from Windows PCs, offered value for the dollar, and were superior for the tasks I needed to do on them.
These days? The Mac population consists largely of rabid zealots who think you're the anti-christ if you don't like Mac for any reason. That's a big reason I hate Macs and refuse to own one. The other part of it is the same zealots who refuse to accept or acknowledge that Macs and OS X are not the superior platform for every single task imagined. Let's put this all into a nice neat list though.

OS X cannot run software I need daily. Do not give me bootcamp or Parallels excuses. If it can't run it natively, then it can't run the damn software. Here's a list. Software that is totally unavailable or unworkable is in red, with severe limitations in yellow. Severe limitations includes lack of plugin availability or cross-platform compatibility issues.
Mozilla Firefox, Microsoft Outlook, Microsoft Word, Microsoft Excel, Microsoft Visio, TweetDeck, vim/Gvim, NFS, CIFS, Pad2Pad, Pidgin, Zune, Steam, VMWare Workstation, Ableton Live, Propellerheads Reason 4, City of Heroes
Ableton Live is in orange because it's there. But half my needed VSTs do not have a Mac version available, or do not work at all on Mac, making it nigh unusable. And a complete lack of Visio means it is completely unusable, because I abuse Visio in depth, daily.

"It's the software, stupid" arguments from Mac zealots. Hello, see immediately previous point. The very existence of Fusion, Parallels and Bootcamp all destroy that argument in nothing flat. If it was the magical OS X, then why do you have to run Windows on your Mac with virtualization software so you can access applications that aren't available on it? Yeah. It's totally everyone else's fault entirely and they're idiots for not maintaining multiple code trees and development teams just for Mac. Sure. And you know what? If I'm just going to reboot the damn thing into Windows anyways, then why the hell would I buy OS X at all?

The hardware is out of date and overpriced. No, you are not getting a "superior" product, you are just paying a ridiculous brand premium for a fashionable chassis. To meet my requirements requires a Mac Pro which costs well over $7,000.00 USD before the 3 year warranty (required thanks to Apple's continuing string of QC and QA problems.) For this $7,000.00 pricetag I get a SINGLE last generation Xeon Quad Core 3.33GHz based on Intel's abandoned 5000-series chipset with only 16GB of nigh unavailable Fully Buffered DIMMs, a $700 non-hardware non-BBU RAID card which I can buy elsewhere for around $100, an ATI Radeon HD4870 512MB at $200 which is not only a generation old but available in the proper 1GB configuration at a little over half that price, a pair of middle of the road DVD drives, iWork, about half of Microsoft Office, and a mediocre warranty with support where I still have to buy the OS upgrades out of pocket with no discount. THIS IS NOT A GOOD VALUE BY ANY MEASURE. For under $7,000 I can build a dual socket 12/24 core system which is not only watercooled but packing two Radeon HD5870's in CrossFire, an SSD and two 500GB/32MB drives in RAID0, and 24GB of DDR3. And there's not one bit of difference except that A) I'm using CURRENT generation hardware B) I don't have the brushed aluminum chassis. Oh wait, I do, except it's black and has hotswap drive bays and is quieter. Hell, I charge under $7K for that built and shipped to your door with half the parts packing 5 year manufacturer warranties!

Apple loves to lock in their vendor lock-out. Nobody gets to just write applications for OS X. You have to pay your special Apple taxes. It's not just an iPhone thing, either. You must pay Apple for documentation, for licenses, and so on. And if they decide they aren't happy with you, they can cut off any developer with no notice, leaving the users in the lurch. Want to use anything non-Apple that isn't an external thumb drive? If they didn't pay their Apple taxes, it's a crap shoot if it'll work or be supported. When was the last time you plugged in a keyboard or mouse to your PC and found that it was completely and utterly incompatible because of the software?

I play games. Get over it, Mac heads. Apple is not a gaming platform. If it was a viable, popular gaming platform, then popular games would be released for it. Instead, you have a handful of MMOs with Mac clients as an option, and some older games. Mass Effect 2 is nowhere to be found, nor is Steam, much less Modern Warfare 2 or Bioshock 2. ME2, MW2 and Bioshock are 3 of the absolute most popular, best selling games out there. They have massive, incredibly large budgets. Yet there is no OS X version of any of these games. Take the hint; if you enjoy playing a variety of games to relax, then Apple is definitely not for you. And World of Warcraft doesn't count, when the vast majority of LUA-based addons don't work on Macs.

The attitude of the company and the "community" just SUCKS. "Look at us in our faded jeans and black turtlenecks and OHMYGODNEWIPHONE MUST HAVE NOW!" Yeah, because a guy with 20 years of system design and building experience is the kind of guy who goes out and buys the latest 1.0 because somebody put a shiny Apple logo on it. But gods help you if you should rationally explain why you don't like iTunes or the iPod or the iPad to most Apple users. Seriously, I've had one actually take a swing at me for suggesting that an old G4 Xserve with significant performance problems was not the best answer, because of the performance problems. I have found on dozens of occasions that Apple users have absolutely zero interest or focus on whether or not the job is done right, or even if the job is done - their only concern is rabid support of a shiny white logo. And the only "appropriate" response to any question of the usefulness or functionality of any Apple product is to shout the person down and insult them.
Please note, this isn't a blanket statement about all Apple users. I know several that are perfectly reasonable and rational, and completely understand and support my refusal to use a product that doesn't work for me. ("Why should you, if it doesn't do what you need? That's silly.") However, as I said, I have had Apple fanatics actually swing at me for questioning Apple. And more often than not, that is exactly the reaction I get when I point out that an Apple product doesn't do what I need or want - screaming, insulting, and swinging. I want no part of any organization or community that considers shouting down acceptable behavior.

The presumption that because I'm a Unix guy I must love OS X. Look. It's an operating system. Get over it. AIX is an operating system. Solaris is an operating system. Windows is an operating system. The notion that any one of these does not suck is a complete fallacy. All operating systems suck, just in different ways. It's a fact of computing. It shouldn't be, but it is, and I deal with it every day. Just because I'm a Unix guy does not mean I prefer or even necessarily want Unix on my desktop. My operating system does not give me any sort of self-gratification or pleasure whatsoever, no matter what it is. I don't flipping care if it's Unix-like! It's an operating system, not a damn religion.
If I wanted Unix on my desktop, then I would use FreeBSD or AIX. But see, the thing is that I don't. I want to get things done and I do not want to waste time beating them into working through emulators like WINE or rebooting to a different OS so I can play a game. I want to turn on my computer, get work done, close it out and go play some games, then shut it down at the end of the day. That's it.

So, I figure I've explained sufficiently why I hate Macs personally. Calmly and rationally. This of course means, that I can expect irrational screaming and death threats from a select few who frankly have no business giving me advice or making demands about what hardware and software I use. Those of you who want to scream and yell at me, go hang out with the people who agree with me that an iMac completely fails to meet my disk requirements and a Mac Pro is horrifyingly overpriced and learn from them. Please.

Thursday, February 11, 2010

Big ESX in a Tiny Box - What's up with the delay here?!

Okay, I owe everyone an explanation, so here goes.

I've run into a problem that was not present on the prior build. This is an extremely severe problem that makes the system completely unusable. Understand this is through no fault of design or implementation here, but rather, due to a very severe bug in ESX/ESXi's Intel EtherExpress driver, specifically in the MSIX Vector section of the e1000 driver. Please understand, this bug did not present previously in any situations I tested. Remember, every component in the system is on the HCL. The problem here is with an Intel i82574L ethernet controller; you can find it under IO Devices, Networking, partner name Intel.

At this point, I'm trying to get a 12x5 Basic contract via VMware so that this bug can be escalated properly. The exact issue, going into the technical side of things, has to deal with how ESX/ESXi handles the Interrupt Vector Address Routing or IVAR for PCI-Express MSIX. If this part goes way over your head, don't worry; it's supposed to. This requires that you have prior experience developing drivers and doing kernel programming to understand. It also requires knowledge of the Intel EtherExpress family and PCI-Express bus.

So, the i82574L/LA has a 5 entry IVAR. Typical drivers will use only the first three IVARs and ignore activities on entries 4 and 5. (Technically, 3 and 4, as it starts at 0. So I'm going to be starting from 0 here.) ESX/ESXi uses or touches all of the IVAR table, 0-4. The i82574 can operate in a number of modes, which are identified by the Function registry entry. In normal operation on most systems, Function will be 0, which indicates the following list of items:
- Operating as LAN0
- Operating as LAN1
- Operating as LAN0 shared with IPMI/BMC
- Operating as LAN1 shared with IPMI/BMC
Yes. A single Function mode indicator, indicates FOUR functions. So how do we control whether we're doing operations strictly for the host, or we're doing operations for the IPMI? By asserting via MSIX and the IVAR table.
Here's where ESX/ESXi's e1000 driver breaks in a predictable and reproducible fashion. I can't explain why it's breaking, only exactly HOW it's breaking.
When operating normally, the e1000 driver will LOSE the MSIX vector completely. This results in the Interrupt Status Register being lost, causing the driver to lose awareness of the controller state as well as halting all network traffic. This is not enough to crash ESX/ESXi, and the driver continues operation without asserting an error, even though interrupts are doing nothing. This also means that the driver is unaware of link state changes, so any HA/FT features will be rendered useless as far as that host is concerned. (Remote hosts will have to assert failure condition on network going unreachable and yank control.) If you attempt to use ethtool to diagnose the i82574L/LA at any point during operation in either online or offline mode, you will get this failure:

PCPU 0 locked up. Failed to ack TLB invalidate (0 others locked
up).
cr2=0x0 cr3=0x400ed000 cr4=0x16c
0:8353/ethtool 1:6231/sfcbd *2:4109/helper1-0 3:5287/sfcbd
4:5101/openwsman 5:5001/hostd 6:5100/openwsman 7:8450/prop_of_i
--
Saved backtrace from: pcpu 0 TLB NMI
Sanitizing and rewriting to make it make sense, here's the actual code path of the failure.
FindIRQInfo+0x69
RemoveIRQInfo+0x41
vmklnx_request_irq+0x32f
e1000_diag_test+0xc5f
ethtool_self_test+0xfc
__ethtool_ioctl+0xe62
vmklnx_ethtool_ioctl+0x7a
netdev_ioctl+0x101
NicCharOpsIoctl+0x65
VmkApiCharDevIoctl+0xe6
DevFSIoctl+0x3e5
FSS_Ioctl+0x17d
UserFile_PassthroughIoctl+0x44
LinuxFileDesc_Ioct+0x7e
User_LinuxSyscallHandler+0xa3

So what's the exact problem? When it allocates the interrupt vector, it promptly loses it. It appears to be IVAR entry 5, as ESXi reports looking for 0x5a5a5a5a and instead getting 0xffffffff.

Exposure of the problem appears to be tied to a change in board settings which most users will make, meaning it's very easy to trigger. (The initial build was slightly different.) Until I get this issue escalated within VMware to the point where they're actually guaranteeing further investigation as well as a fix for this problem, I can't tell you what parts to buy because obviously, they no longer work right now, even though they are on the VMware HCL.

Sorry folks. I'm working my tail off here to get VMware to look at this. I have the crash dumps available on request. I just don't have a support contract.

Phil on Stuff