Capacity Management the hard way: January 2008

Monday, 28 January 2008

Commentary on recent SocGen annoucements

Hmm. Time and time again, when investment banks have these kind of personnel issues, to put it mildly, you always one of two responses:
1. This was an isolated instance and controls have since been put in place to make sure this will not happen again.
2. The individual concerned was a loose cannon/unstable/acting on his own and this cannot happen again.

Commentary: if 1., then why were the controls so rubbish, as an esteemed ex-colleague of mine would say. Once answer is, in part, the fact that workstation activity is not generallty monitored: ie what the traders are doing at an application level. With no such monitring, baseline trading patterns cannot be established so deviations cannot be discovered, or even predicted.

From an application and system level the lack of effective instrumentation means that it is almost impossible to spot abberant trading patterns.

From an organisational level, if the supposed fraud was performed on multiple applications, developed by multiple application teams, the independent nature of most investment banking development organizations, even within, for example, a Fixed Income group, will mean that there are no common application instrumentation and correlation tools. Given the customers of these groups (ie the traders) will always put extended functionality before instrumentation every time, budgets are never created for such events.

Result: the best compliance teams are going to be handicapped if much of the information indicative of irregular trading simply does not exist....

Commentary: if 2. then this is the typical trick of 'playing the man, not the ball'. A simple repost to any such comment is 'Well, this person was apparently [insert punishing adjective here], and still managed to take the bank (front/middle/back office, compliance, management team etc) for billions of [insert currency here], I would hate to find out what a really switched-on person would do'.

What I find really disturbing is that although it is difficult to guard against insider fraud, it should be thought of as impossible. Cases such as Enron, Worldcom etc etc have shown that typically [and not saying that this is the case here] any such major jiggery-pokery was sanctioned at a senior level, and the person 'caught' is normally the fall guy.

Of course, the conspiracy theorists would claim that a rival organization planted a sleeper in the bank and then waited for the bomb to go off.

Tuesday, 22 January 2008

...and if you really cannot sleep

Listen to what USCMG members say about Teamquest - look try and stay awake. I cannot imagine a worse advertising campaign that asking customers to expand their egos at our expense: that is why you have blogs.

What I really like about this is that there is a cracking podcast download from

TeamQuest’s IT Resource Concept Makes Complete Business Sense
Scott Adams
10 min.

I just wish it was the chap who draws Dilbert....

Teamquest announce Release 10

March 2007. New versions of Teamquest announced. There is a pattern here: BMC 2006 new version, Teamquest 2007 new version....

BMC announce award, rename of Patrol

2006: late news, but all the same, BMC seem proud of it: BMC has announced that BMC Performance Manager (formerly Patrol, formerly BGS) has won an innovation from Application Development Manager publication.

This reminds me of Sellafield... formerly Windscale: not that I am comparing enterprise management software with a leaky nuclear power plant..

Thursday, 10 January 2008

...and another thing

Next week, I will be in the US, delivering some performance and capacity planning training, primarily on UNIX [Linux] servers. As due diligence, I have just been finding some [any] decent performance analysis and capacity planning books on Linux, so I could recommend some, if someone asked for any decent material.

$Result=f$coat(get,now) ! for those who remember VMS

Oh dear. I was not impressed. The questions I wanted answering were 'I have 50+ linux servers, how do I performance analyze all of them easily and quickly?'. The answer wasn't any of the books I looked at (via O'Reilly's SAFARI on-line book jobby, not that I get anything for the plug). Why was this, I wondered? Well, here are the answers:

the authors of the books were all pure tecchies.Therefore, they have limited experience in how real end-users [who work for huge financial sector-type enterprises] implement solutions and manage performance problems. You can spot this easily, when one vendor super-tecchie says 'Oh that is not how I do things'. Generally, one cannot just log on and slap some freeware performance tools on the server [yes I know I used to, but I have seen the error of my ways [maxprocesscnt=6]. Frequently, one is not allowed access.
Given that an enteprise may have literally 100's of Linux servers, you may have only two or three system administrators to look after them. 'Look after' = keep them running, not performance manage, tune or capacity plan every server. There are frequently not the skills to do this anyway. What does this mean: one cannot generally pour over a server in production, and apply kernel tweaks etc, unless the business (customer, whatever) approves. Customers only do that when a) there is no other option or b) there is no budget.
If any performance tool starts slaying the CPU (anything over 5% people tend to notice), beyond the norm (1% but there are few commercial performance tools which can do that) then one gets told to remove it.

The answer is to wite my own performance tuning for Linux (not an easy task, given what was left out of the Linux performance counters, compared with proprietary UNIX). There is, for example, a performance tool which gathers decent, low-level stats per process IO, per process disk, response time etc. Sounds great. BUT:

It requires a kernel link because of device drivers.
See 1.

What does this mean in practise: if you re-build the Linux kernel away from the standard, for example, [entirely hypothetically and not based on any previous experience] this will happen:

The hardware vendor will not support you in the event of a system crash eg, [sharp intake of breath], well, guv, we cannot support this system with that kernel driver. Rebuild the kernel withoiut the driver
The software vendor ditto
The storage vendor ditto
The HBA [especially the HBA} vendor ditto.
The OS vendor ditto, after they have fallen over laughing with four legs in the air.

So, there is no solution: I will have to write (and publish probably) a guide to Enterprise Linux System Management.

Put it on the list....

Wednesday, 9 January 2008

Workstations and Drive Performance Analysis

Well, just wrapped a performance report. Found out that the core reason of poor application performance was because the local workstations were equipped with a single SATA (or 'Slower than ATA' as I call them) slow RPM drive. Um. No dedicated controller + slow drive + lots and lots of small IOs = poor performance. Never mind the CPU and memory performance, this shows it always comes down to the slowest moving part - no, no, not the customer...

Nice aspect of capacity planning consulting is that you can say to the customer: fix this! 'This can be just a single drive or controller or bus, even on a lowly workstation.

Moral of story: monitor workstations. Can save a lot of unncessary project spin and a lot of money. Let me see: typical project costs #400 per day? Cost of new SCSI drive for workstation: 160.

Next project: Data Centre design...watch this space.

Friday, 4 January 2008

Gently does it

Friday: Off for a performance analysis presentation. Contains good news and bad news. Good news is that the software is right-sized for the back and mid tier. Bad news: the application clobbers the user's workstation:fat client. Very easy to over-size the back and mid tiers and not even look at the workstation. Why? My experience is that no-one capacity plans the user's pc because desktop support doesn't even think about capacity planning. Instead, it's 'reboot, re-build'.

Thursday, 3 January 2008

Why no Teamquest, Metron,PerfCap in the share listings?

Because they are all private companies!

http://www.metron.co.uk/home/about/index.html

http://www.teamquest.com/about/corporate-information/index.htm

http://www.perfcap.com/AboutUs.htm

This is a factor to be considered when negotiating. A public company be pressurize its salespeople to close deals before FY deadlines - a private company less so. In dealing with private companies is generally a trickier proposition for banks, since there is no shareholder pressure points to be used. Not that anyone would of course.

As a service to my readers, here is the FY end dates of the publically listed companies listed on this blog. All data comes from the Annual Reports, available from their respective urls.

BMC March 31st Q1
HP October 31st Q3
CA March 31 Q1
IBM December 31 Q4

What this blog is about

Probably the best quote I can find is from the 1967 television series "The Prisoner".

This explains the capacity planner's relationship with the customer:

Number 6: Where am I?
Number 2: In the Village.
Number 6: What do you want?
Number 2: Information.
Number 6: Whose side are you on?
Number 2: That would be telling. We want information… information… information.
Number 6: You won't get it.
Number 2: By hook or by crook, we will

Welcome to Capacity Management Blog

Given the intrerest in capacity management for servers, networks and infrastructure caused by heat/power, data centre space, and sheer IT complexity, I thought it the right time to write a blog with all the resources, tools and information for Capacity Management in one place. the sort of site which should exist, but never did: especially for Investment Banks.

Blog characteristics:

Which isn't from a vendor.
Which is free
Which is unbiased*

[Why do I concentrate on investment banks: i) they have the budgets ii) they have the most interesting problems. Of course, if I can track down interesting non-banking tales, I will include those as well.]

Some general stuff which I will always cover:

Capacity Management Industry acquisitions/mergers/sell-offs/de-mergers/chatper 11 [you know who you are]
Hardware vendor events of interest to Capacity Planners, Performance Analysts
A dictionary of definitions of Capacity Planning and all its related terms
What is Performance Analysis
What is Capacity Planning
What is Workload classification
Computer Measurement Group (interesting non-mainframe articles only please)

Some specialised stuff which I will lob in from time to time

Virtualisation: how to analyze and capacity plan for enterprises wishing to move to a virtualised environment without causing a virtual headache.
Blades: Capacity Planning, Performance Analysis specifically:
Benchmarks: what the heck is going on these days?
Queueing theory: where does capacity planning come from? Do all the mathematical formulae designed for manframe and VAXen still add up in today's world.
Horror stories: case studies with the names removed and altered to protect the guilty.
Vendor analysis: good bad and indifferent.
ITIL: what is going on?
CPU differentiation between vendor

* unbiased = regarding all the vendors on the same evolutionary scale until proven otherwise

Stop taking remarks out of context....

Try this:

Alpha will be superior to IA64 in high performance technical computing. Memory
bandwidth and the scalability of the system limit the performance of most high
performance technical applications. Future Alpha processors are adding a low-latency,
high-bandwidth memory interface on chip, together with on-chip support for distributed
shared memory. The next generation Alpha processors will have the fastest memory
system in the industry. Alpha will be the leader in high performance technical
computing.

Source: Compaq 1999. Contact me for the original article...