Thursday, 10 January 2008

...and another thing

Next week, I will be in the US, delivering some performance and capacity planning training, primarily on UNIX [Linux] servers. As due diligence, I have just been finding some [any] decent performance analysis and capacity planning books on Linux, so I could recommend some, if someone asked for any decent material.

$Result=f$coat(get,now) ! for those who remember VMS

Oh dear. I was not impressed. The questions I wanted answering were 'I have 50+ linux servers, how do I performance analyze all of them easily and quickly?'. The answer wasn't any of the books I looked at (via O'Reilly's SAFARI on-line book jobby, not that I get anything for the plug). Why was this, I wondered? Well, here are the answers:

  • the authors of the books were all pure tecchies.Therefore, they have limited experience in how real end-users [who work for huge financial sector-type enterprises] implement solutions and manage performance problems. You can spot this easily, when one vendor super-tecchie says 'Oh that is not how I do things'. Generally, one cannot just log on and slap some freeware performance tools on the server [yes I know I used to, but I have seen the error of my ways [maxprocesscnt=6]. Frequently, one is not allowed access.
  • Given that an enteprise may have literally 100's of Linux servers, you may have only two or three system administrators to look after them. 'Look after' = keep them running, not performance manage, tune or capacity plan every server. There are frequently not the skills to do this anyway. What does this mean: one cannot generally pour over a server in production, and apply kernel tweaks etc, unless the business (customer, whatever) approves. Customers only do that when a) there is no other option or b) there is no budget.
  • If any performance tool starts slaying the CPU (anything over 5% people tend to notice), beyond the norm (1% but there are few commercial performance tools which can do that) then one gets told to remove it.

The answer is to wite my own performance tuning for Linux (not an easy task, given what was left out of the Linux performance counters, compared with proprietary UNIX). There is, for example, a performance tool which gathers decent, low-level stats per process IO, per process disk, response time etc. Sounds great. BUT:

  1. It requires a kernel link because of device drivers.
  2. See 1.

What does this mean in practise: if you re-build the Linux kernel away from the standard, for example, [entirely hypothetically and not based on any previous experience] this will happen:

  • The hardware vendor will not support you in the event of a system crash eg, [sharp intake of breath], well, guv, we cannot support this system with that kernel driver. Rebuild the kernel withoiut the driver
  • The software vendor ditto
  • The storage vendor ditto
  • The HBA [especially the HBA} vendor ditto.
  • The OS vendor ditto, after they have fallen over laughing with four legs in the air.

So, there is no solution: I will have to write (and publish probably) a guide to Enterprise Linux System Management.

Put it on the list....

No comments: