Saturday, 24 January 2009

Hanging by a Thread - Capacity Planning in a Recession

Well in Europe, we know we are thoroughly in the mess. With household name banks announcing a 45Bn loss (GBP), it is either tally ho back to the Wiemar Republic, starring Gordon Brown as the Reichspräsident, or forwards to the euro. Either way, investment banking is a tarnished spirit.

For capacity planners, that is a disaster: most of our work came from that market. As for me, I am spreading my wings to non-investment banking customers.

So what happens in a recession to types like us? OK, here goes...

1. Jobs are eliminated as they are not seen as mission critical
2. The short term savings in staff costs are wiped out when there are no resources to do the work to correctly size systems.
3. Error rates rise: errors in sizing, costing, and performance.

Analysis

In this recession there is an extra wrinkle for us: the issue is that the applications which dragged the banks down into trouble: the Collateral Debt Obligations (CDO) and the even worse (CDO of CDO) were responsible for the explosive growth in calculation farms (and therefore blade servers) since 2002. The idea was that the huge calculations would work out the risk associated with a given basket of deals in the terms of market activity.
So, what happened? Was is that the applications failed to consider sitations where the market bombed beyond this generation's memory? Or was is that the CDOs were constructed on sand - mortgages which could not be repaid...
If the programs were so good, why did they forecsast this as a doomsday scenario?

The answer: read the story of the emporer's new clothes. As long as the money was coming in and the commissions and fees were being paid, no-one wanted to see otherwise. Consider this: the business were telling their developers to base their calculations on more and exotic products; the developers would order huge amounts of blades or other calculation servers to do these calculations faster than the other banks (don't think this is a new thing, the very first calculation servers I capacity-planned (is that a verb? yuk_ were DEC Alphastation 255's) from the hardware vendors (especially HP and IBM) ), and as long as profits were good, everyone was happy: banks and hardware vendors.

In fact things got even better for the hardware vendors when blades went multi-core. Since Intel has to shuffle off its plans for a 4Ghz single core chip (didn't stop IBM Pseries, but that is another article), the blades top speeds were quite slow on a single core basis, and very hot, on a data center basis. For the first time in living memory, application developers for calculation farm apps could not rely on the processor speed to bale them out. I'll dig out some graphs somewhere which prove this. In t'good old days, I could leap from a DL580 G1 PIII Xeon 700Mhz to a DL580 G2 P4 Xeon with speeds rising from 2.0 to 3.06 Ghz single core, and my jolly old application would leap ahead.

But because Intel put the brakes on their plans for a single core 4Ghz chip, applications had to use many more blades then they really had to. Worse, to get the best out of the multi-core CPUs one had to compile their code with chip-specific instructions via the Intel compiler. This recompilation simply could not be done well because of the cf the changes in floating point calcs which the new compiler would need, and the phenomenal amount of re-testing by the gods of the Analytics groups this would involve. In the past, developers would hard wire code to deal with known floating point issues with the compilers on certain chips. Not possible in this world unless you write a new app!

So what does all this mean for us:

Well, there will be lots of unused blade farms on unused data centers as banks pull out of CDO type activity (burnt child fears the fire syndrome). Lots means literally 100's of blades, very high spec, very high cost, very long depreciation - a lot can happen to a bank in three years...

With no customers for these blade servers, this gives data centres a golden chance to migrate the older non-blade x86 servers onto these blades, excising cabinets at a stroke. Whether this will be done, is another matter. One the one hand, the business owning and paying infrastructure costs for these servers will want to make sure any new customers get charged accordingly. On the other hand, any new owners of these servers will say to the previous owners 'you lot got the bank into all this trouble anyway, we're taking them, and you lot should all be sacked anyway for reducing my bonus to 0....'

So capacity planning will consist of modelling moving workloads around servers, from V to P, P to V, Non-blade to blade, and any which way. There are few people and even fewer software products which can help you do that in a client-server environment. In fact, I will put it bluntly, I only know of 5 people in Europe who have the **** to do this, and only one software that ever could do this: PAWZ Planner from PerfCap Corporation.

Think about: the world is evolving, and capacity planners must evolve with it. Otherwise one becimes as extinct as the dodo, or, in computer terms 'Tru64 UNIX'...

g'night,

pos