Capacity Management the hard way: 2008

Thursday, 13 November 2008

HP World 2008 Germany

Just been to HP World Germany talking about Capacity Planning and Itania and other servers. Very small number of customers, very large numbers of HP people. Why? Because it was too expensive to go in this environment.

Shame really, if HP made it say 25 quid rather than 1200 quid to attend sessions (let alone the cost of transport) loads of people would come, HP would get loads of people.

Ah well...

Thursday, 3 July 2008

Grid and Application Designs

Traditionally, the way to improve an application's performance was to throw hardware, mainly CPU at it and hope it worked. In the Grid x86_64 space, this has been the rule for the past 10 years.

But what now: Blade CPU stop at around 3.0/3.2Ghz, and quad core clock speed is likely to be slower, 1.7/2.0Ghz? The traditional CPU route is suddenly running out of road.

Options?

1. Compiler optimization: Use smarter compiler options which take single-threaded applications and 'fake' multi-threading on certain CPUs.
2. Conserve space by replacing all single and dual codde blades with quad cores.
3. Look at application re-design. Find threads which consume a whole CPU when called and optimise them.

Consider this: the blades bought 3 or 4 years ago (single or early dual core) waste too much space in the data centre, and need refreshing with newer quad core blades.
The business will have to pay a heavy price for such early investment, and should ask the question: why do we always have to buy more hardware every year in such volume? Why not look at how the application is written, and tune the code to run on the right processor.

regards

pos

Tuesday, 18 March 2008

Updated website from PerfCap Corporation

After a long wait, it seems that PerfCap Corporation has finally got the web-masters in. New features includes fora, feedback, better product descriptions, and generally more information. At least it will be a resource for their many European customers.

Saturday, 8 March 2008

Storage Capacity Planning - SASAN new software

London: 07 March 2008.
Following decades of server capacity planning, the same tools have been modifed to provide storage capacity planning - multi-vendor multi-what-if: SASAN (Storage Analysis for Storage Area Networks.

Features:

EMC data analyzer
EVA data analyzer
UNIX and Windows data analyzer
OpenVMS and Tru64 specialist data analyzers
OVPA Support
SAN Mapper and configurator
SRDF and remote storage support
Intelligent IO mapping
Intelligent RAID scaling and host/controller based modelling
Automatic production of reports, what-if scenario.
Email for more information, examples, etc. pos@positechconsulting.com

Friday, 7 March 2008

Better Software Required: and supplied

From recent chats with customers on all sides of the IT divide (infrastructure, applicaton developers and business analysts) it seems to me that there is a huge gap in the Capacity Management cycle between business process definition and mapping of transactions to processes or fractions of processes.

So, in the best tradition of chess players who 'announce' checkmate in five moves, followed by a shuffle six moves later, I am going to make a software pre-announcement. For my sins. [software preannoucements are generally called 'vapourware' by the way: not in this case)

Release Date: June 2008.
Company: Positech Consulting
Product Name: not divulged
Internal name: snibbo.
Compatibility: anything
Function:
A dynamic on-line tool for mapping business processes to volumetric analysis software to produce a true end-to-end business function to infrastructure map - to the data centre fundamentals.
Features:
Everything.

Automatic business mapping from templates, or user-defined maps.
Matching of processes to functions
Full-scale performance analysis to a user-defined transaction level
Full-scale capacity planning process giving response time breakdown per workload, and what-if analysis to cover business volumetric and organic growth
Fully Automated
I will repeat that: Fully automated.
Platforms: UNIX, Windows, Linux, OpenVMS
Feeds: any

wish me luck. I had better start coding now.

pip-pip

Friday, 22 February 2008

Itanium Performance Analysis Session with Updates from Intel

Warrington: Tuesday 26th February.
I am giving a performance and capacity planning session with some exclusive update material from intel. Some places still available...

Thursday, 7 February 2008

New whitepaper comparing drive technologies

A new link provides a handy reference to the speeds and transfer rates of SAS, SCSI, FC, ATA, and ATA drives. Compiled one because I was tired of running around looking for the right and accurate data of speeds and transfer sizes.

Watch out for my matching excel link comparing 100s of drives, old and new.

Wednesday, 6 February 2008

Beware the Sold State Disk: seek and ye shall find

Don't get suckered into solid state disks based on seek time alone. Examining all the solid state disks on the market, there is no doubt that for workstations and servers, SSD's offer a panacea - lower temperatures, less power consumption, fast access times.

BUT:
Be not fooled by cheaper SSD's offering lowly SATA (100Mb/sec) transfer rates. Despite the exellent seek time, the relatively poor transfer rate and SATA speeds will hamper IO.

Examples of nifty SCSI-based SSD's:

http://www.storageflex.com/s6.htm

Be not fooled also by FLASH-based SSD's. They are slower for writes that DRAM-based SSD's

Be careful in your selection: if an SSD seems really cheap, it is for a very good reason.

Saturday, 2 February 2008

Cockcroft praises PerfCap PAWZ at USCMG

Adrian Cockcroft, sometime Sun Performance Guru, and author of one of the best performance books on Solaris (ISBN-10: 0131496425) praises PerfCap's PAWZ product at this year's USCMG. I remember emailing him about PAWZ 5 years ago.

Inland Revenue's website crashes...

...due to 'huge demand'. Hmm, well done to whichever outsourcing company looks after Capacity Planning for the Revenue Service. It is not as if they did not know that 31st January was a key date.

Questions to ask, expecting the answer 'No'.

- Were the servers hosting the web layer, application layer and database layers under capacity management?
- Was the infrastructure set up for Capacity Planning? Was a sizing study performed on the 'worst case scenario:namely a workload increase of 'x' on 31/01/08.

I know that this does not happen in the US: H & R Block, who host the on-line service for the IRS actually did capacity planning studies on all levels of their infrastructure to check that the servers could scale with usage.

Obviously not inthe UK. This will get worse next year, and proabably take 1 year to implement.

Well done to all concerned.

Friday, 1 February 2008

BNP interested in SocGen: no surprises there

Following my previous article wondering if a predator would go after a weakened Societe Generale, to the surprise of absolutely no-one we have the predictable situation of BNP Paribas lifting its corporate head over the parapet and expressed some interest in SocGen. Veterans of Banque Paribas will recall a bitter struggle in 2001 between SocGen and BNP for control.

From memory, the then-Paribas board were very much in favour of the offering from SocGen.
Alas, it was not to be: BNP simply showed the shareholders the money.

7 years and a bitter integration programme later (1 huge de-nationalised bureaucacy consuming a lean, mean and strongly-independent albeit smaller bureaucracy), it seems that BNPP may consume SocGen with the same appitite as Paribas. Would the French Government stop this on the grounds of competition? Well, they didn't prevent Credit Agricole and Credit Lyonnais sealing the knot. SocGen's retail business must be a tempting morsel for BNPP, with the investment banking arm possibly going to the BFI sector of BNPP. Remember, the French Government would prefer a SocGen staying in French hands at almost any cost to the French taxpayer.

Think about it...

Monday, 28 January 2008

Commentary on recent SocGen annoucements

Hmm. Time and time again, when investment banks have these kind of personnel issues, to put it mildly, you always one of two responses:
1. This was an isolated instance and controls have since been put in place to make sure this will not happen again.
2. The individual concerned was a loose cannon/unstable/acting on his own and this cannot happen again.

Commentary: if 1., then why were the controls so rubbish, as an esteemed ex-colleague of mine would say. Once answer is, in part, the fact that workstation activity is not generallty monitored: ie what the traders are doing at an application level. With no such monitring, baseline trading patterns cannot be established so deviations cannot be discovered, or even predicted.

From an application and system level the lack of effective instrumentation means that it is almost impossible to spot abberant trading patterns.

From an organisational level, if the supposed fraud was performed on multiple applications, developed by multiple application teams, the independent nature of most investment banking development organizations, even within, for example, a Fixed Income group, will mean that there are no common application instrumentation and correlation tools. Given the customers of these groups (ie the traders) will always put extended functionality before instrumentation every time, budgets are never created for such events.

Result: the best compliance teams are going to be handicapped if much of the information indicative of irregular trading simply does not exist....

Commentary: if 2. then this is the typical trick of 'playing the man, not the ball'. A simple repost to any such comment is 'Well, this person was apparently [insert punishing adjective here], and still managed to take the bank (front/middle/back office, compliance, management team etc) for billions of [insert currency here], I would hate to find out what a really switched-on person would do'.

What I find really disturbing is that although it is difficult to guard against insider fraud, it should be thought of as impossible. Cases such as Enron, Worldcom etc etc have shown that typically [and not saying that this is the case here] any such major jiggery-pokery was sanctioned at a senior level, and the person 'caught' is normally the fall guy.

Of course, the conspiracy theorists would claim that a rival organization planted a sleeper in the bank and then waited for the bomb to go off.

Tuesday, 22 January 2008

...and if you really cannot sleep

Listen to what USCMG members say about Teamquest - look try and stay awake. I cannot imagine a worse advertising campaign that asking customers to expand their egos at our expense: that is why you have blogs.

What I really like about this is that there is a cracking podcast download from

TeamQuest’s IT Resource Concept Makes Complete Business Sense
Scott Adams
10 min.

I just wish it was the chap who draws Dilbert....

Teamquest announce Release 10

March 2007. New versions of Teamquest announced. There is a pattern here: BMC 2006 new version, Teamquest 2007 new version....

BMC announce award, rename of Patrol

2006: late news, but all the same, BMC seem proud of it: BMC has announced that BMC Performance Manager (formerly Patrol, formerly BGS) has won an innovation from Application Development Manager publication.

This reminds me of Sellafield... formerly Windscale: not that I am comparing enterprise management software with a leaky nuclear power plant..

Thursday, 10 January 2008

...and another thing

Next week, I will be in the US, delivering some performance and capacity planning training, primarily on UNIX [Linux] servers. As due diligence, I have just been finding some [any] decent performance analysis and capacity planning books on Linux, so I could recommend some, if someone asked for any decent material.

$Result=f$coat(get,now) ! for those who remember VMS

Oh dear. I was not impressed. The questions I wanted answering were 'I have 50+ linux servers, how do I performance analyze all of them easily and quickly?'. The answer wasn't any of the books I looked at (via O'Reilly's SAFARI on-line book jobby, not that I get anything for the plug). Why was this, I wondered? Well, here are the answers:

the authors of the books were all pure tecchies.Therefore, they have limited experience in how real end-users [who work for huge financial sector-type enterprises] implement solutions and manage performance problems. You can spot this easily, when one vendor super-tecchie says 'Oh that is not how I do things'. Generally, one cannot just log on and slap some freeware performance tools on the server [yes I know I used to, but I have seen the error of my ways [maxprocesscnt=6]. Frequently, one is not allowed access.
Given that an enteprise may have literally 100's of Linux servers, you may have only two or three system administrators to look after them. 'Look after' = keep them running, not performance manage, tune or capacity plan every server. There are frequently not the skills to do this anyway. What does this mean: one cannot generally pour over a server in production, and apply kernel tweaks etc, unless the business (customer, whatever) approves. Customers only do that when a) there is no other option or b) there is no budget.
If any performance tool starts slaying the CPU (anything over 5% people tend to notice), beyond the norm (1% but there are few commercial performance tools which can do that) then one gets told to remove it.

The answer is to wite my own performance tuning for Linux (not an easy task, given what was left out of the Linux performance counters, compared with proprietary UNIX). There is, for example, a performance tool which gathers decent, low-level stats per process IO, per process disk, response time etc. Sounds great. BUT:

It requires a kernel link because of device drivers.
See 1.

What does this mean in practise: if you re-build the Linux kernel away from the standard, for example, [entirely hypothetically and not based on any previous experience] this will happen:

The hardware vendor will not support you in the event of a system crash eg, [sharp intake of breath], well, guv, we cannot support this system with that kernel driver. Rebuild the kernel withoiut the driver
The software vendor ditto
The storage vendor ditto
The HBA [especially the HBA} vendor ditto.
The OS vendor ditto, after they have fallen over laughing with four legs in the air.

So, there is no solution: I will have to write (and publish probably) a guide to Enterprise Linux System Management.

Put it on the list....

Wednesday, 9 January 2008

Workstations and Drive Performance Analysis

Well, just wrapped a performance report. Found out that the core reason of poor application performance was because the local workstations were equipped with a single SATA (or 'Slower than ATA' as I call them) slow RPM drive. Um. No dedicated controller + slow drive + lots and lots of small IOs = poor performance. Never mind the CPU and memory performance, this shows it always comes down to the slowest moving part - no, no, not the customer...

Nice aspect of capacity planning consulting is that you can say to the customer: fix this! 'This can be just a single drive or controller or bus, even on a lowly workstation.

Moral of story: monitor workstations. Can save a lot of unncessary project spin and a lot of money. Let me see: typical project costs #400 per day? Cost of new SCSI drive for workstation: 160.

Next project: Data Centre design...watch this space.

Friday, 4 January 2008

Gently does it

Friday: Off for a performance analysis presentation. Contains good news and bad news. Good news is that the software is right-sized for the back and mid tier. Bad news: the application clobbers the user's workstation:fat client. Very easy to over-size the back and mid tiers and not even look at the workstation. Why? My experience is that no-one capacity plans the user's pc because desktop support doesn't even think about capacity planning. Instead, it's 'reboot, re-build'.

Thursday, 3 January 2008

Why no Teamquest, Metron,PerfCap in the share listings?

Because they are all private companies!

http://www.metron.co.uk/home/about/index.html

http://www.teamquest.com/about/corporate-information/index.htm

http://www.perfcap.com/AboutUs.htm

This is a factor to be considered when negotiating. A public company be pressurize its salespeople to close deals before FY deadlines - a private company less so. In dealing with private companies is generally a trickier proposition for banks, since there is no shareholder pressure points to be used. Not that anyone would of course.

As a service to my readers, here is the FY end dates of the publically listed companies listed on this blog. All data comes from the Annual Reports, available from their respective urls.

BMC March 31st Q1
HP October 31st Q3
CA March 31 Q1
IBM December 31 Q4

What this blog is about

Probably the best quote I can find is from the 1967 television series "The Prisoner".

This explains the capacity planner's relationship with the customer:

Number 6: Where am I?
Number 2: In the Village.
Number 6: What do you want?
Number 2: Information.
Number 6: Whose side are you on?
Number 2: That would be telling. We want information… information… information.
Number 6: You won't get it.
Number 2: By hook or by crook, we will

Welcome to Capacity Management Blog

Given the intrerest in capacity management for servers, networks and infrastructure caused by heat/power, data centre space, and sheer IT complexity, I thought it the right time to write a blog with all the resources, tools and information for Capacity Management in one place. the sort of site which should exist, but never did: especially for Investment Banks.

Blog characteristics:

Which isn't from a vendor.
Which is free
Which is unbiased*

[Why do I concentrate on investment banks: i) they have the budgets ii) they have the most interesting problems. Of course, if I can track down interesting non-banking tales, I will include those as well.]

Some general stuff which I will always cover:

Capacity Management Industry acquisitions/mergers/sell-offs/de-mergers/chatper 11 [you know who you are]
Hardware vendor events of interest to Capacity Planners, Performance Analysts
A dictionary of definitions of Capacity Planning and all its related terms
What is Performance Analysis
What is Capacity Planning
What is Workload classification
Computer Measurement Group (interesting non-mainframe articles only please)

Some specialised stuff which I will lob in from time to time

Virtualisation: how to analyze and capacity plan for enterprises wishing to move to a virtualised environment without causing a virtual headache.
Blades: Capacity Planning, Performance Analysis specifically:
Benchmarks: what the heck is going on these days?
Queueing theory: where does capacity planning come from? Do all the mathematical formulae designed for manframe and VAXen still add up in today's world.
Horror stories: case studies with the names removed and altered to protect the guilty.
Vendor analysis: good bad and indifferent.
ITIL: what is going on?
CPU differentiation between vendor

* unbiased = regarding all the vendors on the same evolutionary scale until proven otherwise

Stop taking remarks out of context....

Try this:

Alpha will be superior to IA64 in high performance technical computing. Memory
bandwidth and the scalability of the system limit the performance of most high
performance technical applications. Future Alpha processors are adding a low-latency,
high-bandwidth memory interface on chip, together with on-chip support for distributed
shared memory. The next generation Alpha processors will have the fastest memory
system in the industry. Alpha will be the leader in high performance technical
computing.

Source: Compaq 1999. Contact me for the original article...