November 25, 2013 by James Christie

In praise of ignorance

My EuroSTAR 2013 tutorial in Gothenburg was titled “Questioning auditors questioning testing”. Not surprisingly a recurring theme of the tutorial was risk. Do we really understand risk? How do we deal with it? These are important questions for both testers and auditors.

I argued that both auditors and testers, in their different ways, have struggled to deal with risk. The failure of auditors contributed to the financial crash of 2007/8. The problems within the testing profession may have been less conspicuous, but they have had a significant impact on our ability to do an effective job.

One of the issues I discussed was our tendency to perform naïve and mechanical risk assessments. I’m sure you’ve seen risk matrices like this one from the Health and Safety Executive, the UK Government inspectorate responsible for regulating workplace safety.

There are two fundamental problems with such a matrix that should make testers wary of using it.

Firstly, it implies that cells in the matrix with equal scores reflect equally acceptable positions. Is that really the case? Is a trivial chance of a catastrophe genuinely as acceptable as the near certain chance of a trivially damaging incident? The HSE deals with the sort of risks that lead national news bulletins when they come to pass; their remit includes chemical plants, North Sea oil rigs and explosives manufacturers. I suspect the HSE takes a rather more nuanced approach to risks than is implied by the scoring in the matrix.

The second basic problem with these risk matrices is that we often lack the evidence to assign credible estimates of probability and impact to the risks.

This problem applies in particular to probabilities. Is there a reasonable basis for the figures we’ve assigned to the probabilities? Are they guesses? Are we performing precisely engineered and sophisticated calculations that are ultimately based on arbitrary or unsound assumptions? It makes a huge difference to the outcomes, but we can be vague to the point of cluelessness about the basis for these calculations.

Such a matrix may be relevant for risks where both the probability and the likely impact of something going wrong are well understood. That is usually not the case during the early stages of a software development when the testing is being planned.

What’s the point of putting a number on the probability?

Whilst I was preparing my tutorial I came across an interesting case that illustrated the limitations of assigning probabilities when we’ve no experience or historic data.

Enrico Fermi

I was reading about the development of the atomic bomb during the Second World War. Before the first bomb was tested the scientists were concerned about the possibility that a nuclear explosion might set the atmosphere on fire and wipe out life on earth. Enrico Fermi, the brilliant Italian nuclear physicist who worked on the development of the atomic bomb, estimated the probability of such a catastrophe at 10%.

I was astonished. How could anyone have taken the decision to explode an atomic bomb after receiving such scientific advice? My curiousity was aroused and I did some background reading on the episode. I learned that Fermi had also been asked in 1939 for his estimate of the probability that nuclear fission could be controlled for power or weapons. His estimate was 10%.

Then, in a separate article, I discovered that in 1950 he had estimated the probability that humans would have developed the technology to travel faster than light by 1960. You’ve guessed it. The answer was 10%.

Apparently Fermi had the reputation for being a sound estimator, when (and this is a crucial qualification) he had the information to support a reasonable estimate. Without such information he was clearly liable to take a guess. If something might happen, but he thought it unlikely, then he assigned a probability of 10%.

I think most of us do no better than Fermi. Indeed, the great majority are probably far worse. Are we really any more capable than Enrico Fermi of assigning probabilities to a naïve risk matrix that would allow simple, mechanical calculations of relative outcomes?

I strongly suspect that if Enrico Fermi had thought anyone would take his estimates and slot them into a simplistic risk formula to guide decision making then he’d have objected. Yet many of us see nothing wrong with such a simplistic approach to risk. I wonder if that’s simply because our risk assessments are little more than a tickbox exercise, a task that has to be knocked off to show we are following “the process”.

The incertitude matrix – risk, uncertainty, ambiguity and ignorance

The risk matrix clearly assumes greater knowledge of probabilities and outcomes than we usually have. A more useful depiction of the true situation is provided by O’Riordan and Cox’s incertitude matrix. See “Risk and Uncertainty” by Paddy Cox and “The Politics of GM Food – Risk, Science & Public Trust” (PDF, opens in a new tab) by the Economic & Social Research Council.

In this representation the conventional risk matrix occupies only the top left hand corner. We are in a position to talk about risk only when we have well defined outcomes and a realistic basis for assessing the probabilities.

If you are offered the chance to roll a dice and told that you will lose your job if the dice shows 1 to 3, but gain a bonus of £100,000 if it shows 4 to 6 then you are dealing with risk. You can then do a sensible calculation of the expected benefit of taking the gamble.

Now, consider the case of being shown a bag full of identical balls which are either red or black. You can draw a ball at random and if it is red you will lose your job and if it is black you will get the £100,000 bonus. The balls in the bag might be all red, all black, or a mixture. You have no way of knowing what the probability is of drawing a black ball. That is uncertainty.

If we understand the outcomes, but not the probabilities then we are in a state of uncertainty. If we understand the probabilities of events, but not the outcomes then we are dealing with ambiguity.

Ambiguity might apply to a raffle in which there are 1,000 tickets and 100 prizes ranging from cans of juice to family holidays in the sun. You don’t know how many prizes there are of each type. The chances of winning something are 10%, but you’ve no idea what you might win.

Ambiguity is easy to understand in principle, but it’s a subtle and interesting problem in practice. To me it seems more relevant to scientific problems than software development. My wife works in the field of climate change adaptation for a Scottish Government agency. She recognises ambiguity in her line of work, where the probability of initial events might be reasonably well understood, but it isn’t possible to define the outcomes. Feedback mechanisms, or an unknown tipping point, might turn a benign outcome into a catastrophic one in ways we can’t predict with confidence.

One area where ambiguity could exist in software development is in the way that social media can create entirely unpredictable outcomes. An error that might have had little impact 10 years ago could now spiral into something far more serious if it catches people’s attention and goes viral.

Nevertheless, uncertainty, rather than ambiguity, is probably the quadrant where testers and developers are more likely to find themselves. Here, we can identify outcomes with confidence, but not assign meaningful probabilities to them.

However, uncertainty is unlikely to be a starting point. To get there we have to know what part of the product or application could fail, how it might fail and what the effect would be. We might sometimes know that at the start, if this is a variant on a well understood product, but often we have to learn it all.

The usual starting point, our default position, should be one of ignorance. We don’t know what can go wrong and what the impact might be, and we almost certainly don’t know anything with confidence about the probabilities.

Knightian uncertainty

This takes us on to the closely related concept of Knightian uncertainty, which is familiar to economists (my undergraduate degree was in Economics). The concept, if not the terminology, should also be familiar to testers and certainly auditors, but I have my doubts about whether that is true.

The economist Frank Knight argued that we must recognise the distinction between risk and uncertainty. Risk applies to situations where we know the possible outcomes, and can accurately calculate the probability of each outcome. Uncertainty applies when we know the outcomes, but have no information or basis for calculating the probabilities.

You can argue quite reasonably that the distinction is somewhat artificial in the messy and conmfusing reality of dealing with complex problems. There is never true certainty about probabilities. Everything is uncertain to varying degrees.

However, the distinction is still useful in practice as a warning against overconfidence in assigning precise, but inaccurate, probabilities to outcomes, and as a reminder that we should try to gain greater knowledge about probabilities before we start to perform simplistic calculations of risk.

We have to move away from floundering in uncertainty towards management of risk. Often in reality we are starting from a position of ignorance and have not even moved into uncertainty.

Ignorant and proud!

Sadly, in business as well as software development, an honest admission of ignorance is seen as weakness. The pretence that we know more than we do is welcomed and admired rather than being derided as dangerous bluster. Such misplaced confidence leads to disappointment, stress, frustration, misdirected effort, and actually makes it harder to learn about products and applications. We deceive ourselves that we know things that we don’t, and stop digging to find out the true situation.

Don’t, please don’t, churn out these simplistic risk matrices and try to kid stakeholders that you have a sound understanding of the risks.

Please, speak up for ignorance! Only if we admit what we truly don’t yet know can we hope to learn the lessons that will give our stakeholders the insights that they need. Surely we need to look for and understand the most damaging failures before we start thinking of assigning probabilities that might guide the rest of our testing. Don’t assume that knowledge we’ve not gained can ever be a valid starting point for calculations of risk. Knowledge has to be earned, not assumed.

November 18, 2013 by James Christie

Thinking the impossible? Or wishing for the impossible?

At EuroSTAR 2013 in Gothenburg there was a striking contrast between messages coming out of tutorials that were taking place at the same time.

Ian Rowland was talking about how we can do amazing things by “thinking the impossible”. Meanwhile, along the corridor I was giving out a much more mundane and downbeat message in my tutorial about how testers can work constructively with auditors.

I was talking about how auditors are allergic to statements of brainless optimism. Plans should be based on evidence that they are achievable, not on wishful thinking that defies the evidence.

You might think Ian was contradicting me, but I was entirely happy with his message when he repeated it in a later keynote.

In my tutorial I referred to a tweet from James Bach that made the telling point that “people who say failure is not an option are in fact selecting the failure option: by driving truth away”.

I backed that up with a slide showing a tiresome and glib illustration of a little man boldly turning “impossible” into “possible”, with two strokes of a pen. That sort of unthinking positivity really riles me.

The unthinking “can do” spirit

As an auditor I regularly reviewed project plans and frequently they were implausibly optimistic. We were rarely in a position to block such plans. It wasn’t our place to do so. That was a decision for operational and IT management. We would comment, but ultimately management was responsible for the decision to proceed and for the consequences.

Only once was I a direct stakeholder. I insisted that a project should be replanned. The intention was to carry out user testing, user training and then start a phased roll-out during the six week English school summer holidays. That’s when every office was running with reduced staff levels. Initially I was rather unpopular, but the project team were quietly relieved once the responsibility was taken out of their hands. They’d been under unreasonable pressure to go live in September. It was never going to happen.

In that case I was able to defend the project team, but more often I saw the damaging effect on staff who were committed to unrealistic, ludicrously optimistic timescales.

I once interviewed a Chief Technology Officer, who candidly admitted that the culture of the company was that it was far better to say “Yes we can” and then emerge from the project sweaty, exhausted and hopelessly late than it was to say at the start how long it would actually take. He said the users were never prepared to accept the truth up front.

I remember another project whose schedule required a vital user expert to be available for 10 days in November. She already had two weeks holiday booked, and was committed to 20 days working for another project, all in November, a total of 40 working days. Of course both projects were business critical, with fixed target dates that had been dumped on them by senior management. Both projects were late – of course.

If auditors are involved early on in the planning of projects they can sniff out such problems, or force them to be confronted. Sometimes projects are well aware of problems that will stop them hitting their targets but they are scared to flag them up. The problem is kicked into the long grass in the hope that dealing with an urgent problem further down the line will be less damaging than getting a reputation for being negative by complaining early on.

That fear might seem irrational, even bizarre, but it is entirely justifiable. I reviewed a development that had had serious problems and was very late. The development team lead had said the schedule was unrealistic given the budget and available staff. She was removed from her role. Her replacement said she could do it. Eventually the development work was completed in about the time the original team lead had predicted. The successor was praised for her commitment, whereas the original team lead was judged to lack the right attitude. Her career was blighted, and she was deeply disillusioned when I interviewed her.

Be lucky! That’s an order!

Usually when senior management overdoses on the gung ho spirit and signs up to the John Wayne school of inspirational leadership the result isn’t “thinking the impossible”. The result is an order to the troops to be lucky – freakishly lucky. This isn’t thinking the impossible. It’s thinking the impossible will happen if we switch off our brains and treat the troops like disposable cannon fodder.

If the results of unthinking, high volume managerial optimism have been appalling in the past then the evidence is that they will be appalling in the future. There has to be a reason to assume things will get better, and brainless optimism provides remarkably poor evidence.

If your experience of following standards, inappropriate “best practices” and excessively prescriptive processes has been dismal then you won’t get better results in future from sheer optimism, force of will and a refusal to acknowledge the truth.

Insistence on trying to do the wrong things better and faster is not only irrational, it is a deep insult to the people whose working lives are being made miserable. If you have experienced persistent failure and insist on continuing to work in the same way then the clear implication is that failure is the fault of the people, the very people who ensured anything worthwhile was ever delivered.

Ian Rowland had a marvellous and inspirational message. Sadly it’s a message that can prove disastrous if it’s picked up by the sort of managers who regard thinking and reflection as a frivolous waste of time. Be inspired by Ian’s message. Just be very careful who you share it with! Yes, thinking the impossible is wonderful. But that really does require thinking, not wishing!

November 12, 2013 by James Christie

Testing standards? Can we do better?

At EuroSTAR 2013 I had a brief disagreement about software testing standards with Stuart Reid. To be more accurate, I was one of a group of sceptics pressing Stuart, who was putting up a battling defence of standards. He has been working on the new standard ISO 29119 and made a very interesting and revealing point. He insisted that the critics of standards don’t understand their true nature; they are not compulsory.

The introduction to standards makes it clear that their use is optional. They become mandatory only if someone insists that they must be applied in a certain context, often by writing them into a contract or a set of in-house development standards. Then, and only then, is it appropriate to talk about compulsion. That compulsion comes not from the standard itself, but from the contract or the managerial directive.

I found that argument unconvincing. Indeed I thought it effectively conceded the argument and amounted to no more than a plea in mitigation rather than a plausible defence.

Even a cursory analysis of this defence reveals that it is entirely specious, merely a statement of the obvious. Of course it is a choice made by people to make standards mandatory, but that choice is heavily influenced by the quite inappropriate status of IEEE 829 and, in all likelihood ISO 29119, as standards. Calling them standards gives them a prestige and authority that would be missing if they were called guidelines. The defenders of standards usually want it both ways. They refer to standards when they are making an implicit appeal to authority. They refer to the standards as guidelines when they are on the defensive. That doesn’t wash. Standards and guidelines are not synonymous.

Stuart’s defence struck me as very interesting because it was entirely consistent with what I have long believed; the rationale behind standards, and their implicit attraction, is that they can be given mandatory status by organisations and lawyers with a poor grasp of software testing.

The standards become justified by the mandatory status assigned to them by non-testers. The justification does not come from any true intrinsic value or any wisdom that they might impart to practitioners. It comes from the aura of the word “standard” and the creators of standards know that this gives them a competitive advantage.

Creating standards is a commercial activity

Standards are not produced on a disinterested “take it or leave it” basis. They do not merely offer another option to the profession. Standards are created by people from the companies who will benefit from their existence, the companies who will sell the services to implement the new standard. In my experience heavyweight, document-driven processes require large numbers of expensive consultants (though not necessarily highly skilled consultants). Creating standards is a commercial activity. The producers of standards are quite consciously creating a market for the standards.

If the creators of standards were merely expanding the market to create a profitable niche for themselves that might not be a big deal. However, the benefit that accrues to them comes at the the expense of everyone else.

It comes at the expense of the testers who are frequently committed to following inappropriate and demoralising practices.

It comes at the expense of their employers who are incurring greater and unnecessary costs for results that are poorer than they need be.

It comes at the expense of the whole testing profession. The standards encourage a dangerous illusion. They feed the hunger to believe, against all the evidence, that testing, and software development in general, are neat, essentially linear activities that can be be rendered orderly and controllable with sufficient advance documentation. Standards feed the illusion that testing can be easier than it really is, and performed by people less skilled than are really needed.

As I said in my EuroSTAR tutorial last week, testing is not meant to be easy, it’s meant to be valuable.

Good contracts or bad contracts?

It is understandable that the contract lawyers find standards attractive. Not only do standards offer the lawyers the illusion that they promote high quality and define the correct way for professionals to work, they also offer the lawyers something they can get their teeth into. A standard makes it easier to structure a contract if you don’t know about the subject area. The standard doesn’t actually have to be useful. The point is that it helps generate deliverables along the way, and it requires the testers to work in a way that is easy to monitor.

Contracts are most useful when they specify the end, or the required value; not when they dictate how teams should reach the destination. Prescriptive contracts can turn unwarranted assumptions about the means into contractually mandatory ends.

I once faced what looked like a horrendously difficult challenge. I had to set up a security management process for a large client, who wanted assurance that the process would work effectively from the very start. This had been interpreted by my employer as meaning that the client required a full-scale, realistic test, with simulated security breaches to establish whether they would be detected and how we would respond. This would have been very difficult to arrange, and extremely expensive to carry out. Failure to deliver on the due date would have resulted in heavy weekly penalties until we could comply. However, the requirement was written into the contract so I was told we would have to do it.

I was sceptical, and went back to the client to discuss their needs in detail. It turned out that they simply needed to be reassured that the process would work, smoothly and quickly. Bringing together the right people from the client and supplier for a morning to walk through the process in detail would do just as well, at a tiny fraction of the cost. Once I had secured the client’s agreement it was straightforward to have the contract changed so that it reflected where they really wanted to end up, rather than stipulating a poorly understood route to that destination.

On many other occasions I have been stuck with a contract that could not be changed and where it was mandatory for testers to comply with milestones and deliverables that had minimal relevance to the real problem, but which required such obsessive attention that they detracted from the real work.

Software testing standards encourage that sort of goal displacement; management attention is directed not at the work, but at a dubious abstract representation of the work. Their attention is directed to the map, and they lose sight of the territory.

We can do better

Sure, no-one has to be a sucker. No-one has to buy the snake oil of standards, but caveat emptor (let the buyer beware) is the legal fallback of the huckster. It is hardly a motto to inspire. Testers can do better than that.

What is the answer? Unfortunately blogs like this preach largely to the converted. The argument against standards is accepted within the Context Driven School. The challenge is to take that argument out into the corporations who are instinctively more comfortable, or complacent, with standards than with a more flexible and thoughtful approach.

I tried to challenge that complacency in my EuroSTAR tutorial, “Questioning auditors questioning testing”. I demonstrated exactly why and how software testing standards are largely irrelevant to the needs of the worldwide Institute of Internal Auditors and also the Information Systems Audit and Control Association. I also explained how more thoughtful and effective testing, as promoted by the Context Driven School, can be consistent with the level of professionalism, accountability and evidence that auditors require.

If we can spread the message that testing can be better and cheaper then corporations might start to discourage the lawyers from writing damaging contracts. They might shy away from the consultancies offering standards driven processes.

Perhaps that will require more than blogs, articles and impassioned conference speeches. Do we need a counterpart to testing standards, an anti-standard perhaps? That would entail a clearly documented explanation of the links between good testing practices and governance models.

An alternative would have to demonstrate how good testing can be accountable from the perspective of auditors, rather than merely asserting it. It would also be directed not just at testers, but also at auditors to persuade them that testing is an area where they should be proactively involved, trying to force improvements. The testers who work for consultancies that profit from standards will never come on board. The auditors might.

But whatever form such an initiative might take it must not be called a standard, anything but that!

Edit. A petition has been set up in August 2014 to calling for ISO to withdraw ISO 29119 on the ground that it lacks the consensus that its own rules requires. Consensus is defined in ISO/IEC Guide 2:2004 as follows.

“Consensus: General agreement, characterized by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments.”

The petition argues, correctly, that there is no consensus. Further, the process did not seek to take into account the views of all parties concerned. The standard reflects one particular view of how testing should be conducted and marginalises those who disagree. If governments and companies insist that ISO 29119 should be adopted, and that suppliers should comply, this will have a dramatic, and damaging, effect on testing and our careers.

I urge all testers to sign the petition.

James Christie's Blog

thoughts of a software testing consultant from Scotland

Month: November 2013