Wednesday, January 27, 2010

Big ESX in a Tiny Box, Part 1 – My Home Environment Of Doom.

I have a pretty big home environment. Of course I do – I’m a Unix guy. It’s what we do. What people don’t understand just how big, ugly, and loud my home environment can be. Let’s start with the “smaller” boxes. I sit at my primary workstation, which is an aging Intel E6750 with 4GB and a Radeon HD4870, since I play a lot of high requirement games. It’s still a fairly respectable box, running Windows 7 Ultimate for the NFS client. I have VMware Workstation as well, which I use for testing some things.

Then comes the actual Unix environment. This is where things turn from “mild” to downright horrifying. Starting with the oldest systems first, I have a code-control system which is used by multiple people – it’s an IBM Netfinity 5500 with dual Pentium III 733s and 768MB, forcibly crammed into the desk. I do my damndest to not turn it on ever because it’s loud. I’ve even moved most of its workload elsewhere. It’s big, it’s power hungry and it’s loud. Everything I don’t like in a home server.
Adding to this is my IBM Netfinity 4500R, sitting atop my desk. This system is extremely, extremely loud at times. Packing dual Pentium III 733’s and a whopping 2GB of PC133 ECC Registered, this is one of my test machines. It was originally purchased for PCI-HP development testing and Ethernet driver testing. It’s not so power hungry, dual 280W power supplies, but it’s long past its usable lifespan.
The current “core infrastructure” server is an embarrassment, to be honest. Handling routing, NAT, mail serving, database, and webserving duties is an AMD Athlon XP 1GHz. That is not a typo, folks. It has a whopping 256MB of DDR. Not DDR2, and not even PC2100 – PC1600 original DDR. It no longer has working video output or keyboard input, the serial console stopped working last month, and the four 20GB IDE disks are probably at double their rated MTBF. One of them has begun having spin-down issues intermittently.
To solve the “core” server, I slapped together a replacement with spares, and began the slow process of a total rebuild and migration. About mid-way through, Abit stopped making motherboards and ran out of warranty replacements. So a low-hour E6550 with 4GB and a pair of new 250GB SATA disks instead became the “limp along” server, currently handling database duties, internal web serving, some proxying load, and a bit of NFS serving.
Utterly and absolutely destroying all of these boxes is “Big Bertha.” I do not ever turn Big Bertha on except when it is absolutely necessary. That doesn’t happen often. That’s because it’s a special box: an Iwill H8502 with eight Opteron 2.0GHz processors, 64GB of memory, dual LSI MegaRAID 320-2X’s, and more. Oh. And it has 4 1650W power supplies which require 2x20A circuits to start up in sequence. Don’t even ask what it takes to start up in parallel; I’ve never even attempted it, and I don’t think it’s possible.

I also have other machines scattered about on top of this. Another low-end workstation, three laptops, an IBM POWER4, a Sun Ultra 2, you pretty much get the idea. Not to mention my Xbox360, my PS2, a Wii and yeah... the list goes on. Repeat after me, everyone: THIS DOES NOT WORK. AT ALL.
I am beyond out of space, my power bills are through the roof, and most of the equipment is well past it’s usable life. Let’s do some math here. These numbers are based on measured or calculated power draw. 745W + 690W + 420W + 322W + 470W + 505W + … you know what? Let’s just stop. You’ve already figured out that it’s over 2KW if I turn everything on, before Big Bertha. Big Bertha does draw heavily – 4x 8.5A @ 110V = 3.7KW while running. (“Do NOT turn on the hair dry-DAMNIT!”) Seriously, if your home network has ceased to be measured in watts and moved on to “number of breakers required,” you have got serious, serious problems.

So, how do I fix this monstrosity? Again, I’m a Unix guy, and this is my production environment! I receive 90% of my email on that ancient server, and it provides services to more than just me. Tossing everything and just getting some crappy Linksys router isn’t even remotely an option. Enter VMware ESXi 4.0 Update 1. FreeBSD works beautifully in it, overhead is pretty damn low, resulting in pretty good performance. I don’t hammer on the boxes obsessively, and given the existing hardware, I think newer coffee makers have more processing power than the individual systems. Sure, I could just do it with FreeBSD jails on a big box, but that box will still end up having to be pretty damn big. And honestly, jails are a management nightmare, and don’t solve the firewall problem. I still need a separate physical firewall then. Not so with VMware – I can put everything (except the Sun and IBM) onto a single physical system provided I have enough NICs, and get appropriate memory isolation.

So, now that you’ve got a grasp on the environment, the fix starts to make sense. I'm consolidating a bunch of complex systems on ancient hardware into a single ESX server. But I really don’t want a big box. So how do I solve that problem? How do I really cut my power bills? And how do I make this something more than just a fix for me, but an actual setup that can be expanded into a true Enterprise grade ESX setup? (Look, when I say Enterprise grade, trust me, I actually do mean Enterprise grade.) So I put my 20 years of x86 system building and engineering to work along with my web browser, and found an answer. This answer was briefly discussed in VMTN Community Podcast #79 – I’m @RootWyrm for those of you on Twitter.

The final product has been ordered, and parts will be arriving later this week. Along with parts will come more on this topic, including the hardware details and the ESXi build. Stay tuned, it’s pretty darn awesome.