Category Archives: Context-driven testing

The wisdom of the crowd has created an awesome resource for context-driven testers

Background

I’ve enjoyed a long and fruitful relationship with the Association for Software Testing (AST), both as a member and through the CAST conferences (where I’ve been an organizer for Australian conferences and a speaker & delegate at US conferences).

When I was approached about the idea of creating an e-book to provide responses to some common questions and statements about testing from a context-driven perspective, I was keen to be involved. The idea was to crowdsource the content and then for me to collate responses for the e-book from this content. The e-book was designed to act as an FAQ for the day-to-day situations a tester may find themselves in and how to approach them from a context-driven perspective.

The Navigating the World as a Context-Driven Tester e-book project kicked off early in 2021 and the final edit has just been made after some 28 requests for contributions. These requests were made across multiple platforms, including Twitter (X), LinkedIn, Slack (viz. in the AST and Rapid Software Testing channels), Mastodon and the AST mailing list.

My experience of putting the book together

I loved the concept for this e-book and was excited to make the first few requests for contributions to see if there was similar interest and excitement in the project from the broader testing community. I was pleased to see lots of early engagement and my job in collating a response for the book was often a difficult one thanks to the sheer number and diversity of responses received.

It was interesting to see which requests got the most interest and I was often surprised by which ones received many responses. It was great to see that responses continued to come in even as the project entered its third year.

The best channels to elicit responses from requests for contributions to the e-book changed over time.

The AST and RST Slack channels provided by far the largest numbers of responses across the project, perhaps reflecting the community of more seasoned practitioners active on these platforms. Twitter was a good source at the start of the project, but faded quickly as many testers moved off the X platform. LinkedIn was fairly consistent throughout, but never a huge source of responses. The inclusion of Mastodon for the last year or so of the project resulted in only a very small number of responses and the AST newsletter was similarly ineffective in generating responses.

This e-book has been compiled from the collective wisdom of many excellent testing practitioners and I feel that it book provides a lot of value, especially to less experienced testers. It is my hope that the e-book will be a handy reference for testers and I look forward to hearing stories of how it’s proved to be useful.

As a reminder, the e-book is freely available on the AST’s GitHub, Navigating the World as a Context-Driven Tester.

Vital statistics

A few stats about the book and the process of creating it:

  • 72 contributors helped to shape the content of the e-book (all of whom are attributed).
  • 292 responses were received from the 28 requests for contributions.
  • The request that drew the most interest was “Testing is a bottleneck”, with 28 responses.
  • James Thomas provided responses to all 28 requests via a separate blog post for each request. Amit Wertheimer and Frances Turnbull chipped in more than 20 responses each.
  • 7 requests for contributions were made in 2021, 9 in 2022, 7 in 2023 and 5 in 2024.
  • The 28 responses consist of 7 in the “Testing” category, 6 in “Testers”, 4 in “Automation”, 3 in “Context-driven testing”, 3 in “Project Scheduling”, 2 in “Testing Status”, 2 in “Career” and 1 in “Scripts/test cases”.
  • The requests that drew the least interest were “What’s the best format for a test plan?” and “Pair and ensemble testing look like a waste of time and resources to me. What do you think?”, with just 4 responses each.

With thanks

Thanks to the AST for trusting me with the curation of this project and also to the various AST board members who reviewed my collated responses before publication and supported the project in numerous ways.

I’m so grateful to the 72 folks who made the effort to contribute responses – this book wouldn’t exist without you!

The questions/statements

The questions/statements that formed the 28 requests for contributions are listed below:

Question/statementRequest made#responsesresponse published
We test to make sure it works07/04/212121/06/21
Let’s just automate the testing03/05/21921/06/21
Isn’t all testing context-driven?14/06/211519/07/21
Do more test cases mean better test coverage?27/07/211329/08/21
What percentage of our test cases are automated?08/09/21903/10/21
Stop saying “it depends” when I ask you a question11/10/211301/11/21
Testing is a bottleneck06/12/212810/01/22
What’s the difference between context-driven testing and exploratory testing?16/01/22612/02/22
Will the testing be done by Friday?11/02/22807/03/22
We need some productivity metrics from testers12/03/22916/04/22
There are no best practices, really?01/05/22526/05/22
Why didn’t you find those issues before we shipped?05/06/22502/07/22
For your annual review, I’ll need to see evidence of what you produced this year01/08/221223/08/22
What’s the right ratio of developers to testers?27/08/221027/09/22
What’s the best testing metric?03/10/22807/10/22
Testing is just to make sure the requirements are met03/11/221504/12/22
Whenever possible, you should hire testers with testing certifications11/01/231606/02/23
Developers can’t find bugs in their own code12/02/231013/03/23
Stop answering my questions with questions18/03/23916/04/23
When is the best time to test?29/04/231022/05/23
Testers are the gatekeepers of quality29/05/231512/07/23
If testers can’t code, they’re of no use to us18/07/23915/08/23
Is observability and monitoring part of testing?11/11/23906/12/23
When the build is green, the product is of sufficient quality to release08/01/24707/02/24
What’s the best format for a test plan?22/02/24421/03/24
Why don’t we replace the testers with AI?07/04/24729/04/24
How can I possibly test “all the stuff” every iteration?01/06/24608/07/24
Pair and ensemble testing look like a waste of time and resources to me. What do you think?23/07/24416/08/24

(The featured image for this post was inspired by my recent travels to Lisbon, Portugal, and its famous number 28 tram – thanks to Victoria Emerson on Pexels.com)

ER: “Ask Me Anything” session on Exploratory Testing

I took part in my first “Ask Me Anything” session on 22nd March, answering questions on the topic of “Exploratory Testing” as part of the AMA series organized by The Test Tribe.

Presenting an AMA was a different experience in terms of preparation compared to a more traditional slide-driven talk. I didn’t need to prepare very much, although I made sure to refamiliarize myself with the ET definitions I make use of and some of the most helpful resources so they’d all be front of mind if and when I needed them to answer questions arising during the AMA.

The live event was run using Airmeet.com and I successfully connected about ten minutes before the start of the AMA. The system was easy to use and it was good to spend a few minutes chatting with my host, Sandeep Garg, to go over the nuts and bolts of how the session would be facilitated.

We kicked off a few minutes after the scheduled start time and Sandeep opened with a couple of questions while the attendees started to submit their questions into Airmeet.

The audience provided lots of great questions and we managed to get through them all, in just over an hour. I appreciated the wide-ranging questions which demonstrated a spectrum of existing understanding about exploratory testing. There is so much poor quality content on this topic that it’s unsurprising many testers are confused. I hope my small contribution via this AMA helped to dispel some myths around exploratory testing and inspired some testers to take it more seriously and start to see the benefits of more exploratory approaches in their day-to-day testing work.

Thanks to The Test Tribe for organizing and promoting this AMA, giving me my first opportunity of presenting in this format. Thanks also to the participants for their many questions, I hope I provided useful responses based on my experience of adopting an exploratory approach to testing over the last 15 years or so!

The full “Ask Me Anything” session can be viewed on The Test Tribe’s YouTube channel:

ER: presenting at the AST’s “Steel Yourselves” webinar (30th January 2023)

I was delighted to be invited to participate in a webinar by the Association for Software Testing as part of their “Steel Yourselves” series. The idea is based on the Steel Man technique and I was required to make the strongest case I could for a claim that I fundamentally disagree with – I chose to argue for “Shift Nowhere: A Testing Phase FTW”!

I had plenty of time to prepare for the webinar and to do my research on the use and abuse of testing phases. I also looked into the “shift left” and “shift right” movements as counterpoints to the traditional notion of the testing phase. Sorting through the various conflicting and contradictory ideas around testing phases was an interesting process in itself. I built a short PowerPoint deck and rehearsed it a couple of times (so thanks to my wife, Kylie, and good mate, Paul Seaman, for being my audience) to make sure I would comfortably fit my arguments for a testing phase into the ten-minute window I would have during the webinar.

January 30th came around quickly and the webinar was timed well for Europe (morning) and Australia (evening) as well as places in-between, so it was good to see an audience from various parts of the world. The session was ably facilitated by James Thomas for the AST and Anne-Marie Charrett went first, to make her case that “Crosby was Right. Quality is Free”. She did a great job, fielded the questions from audience really well and made some good observations on the experience – and concluded right on time at 30 minutes into the session.

I felt like I delivered my short presentation in defence of a testing phase pretty well, getting a few smiles and interesting body language from the audience along the way! There were plenty of questions from James and the audience to challenge my claims and I tried hard to stay “in character” when answering them! The final section of the webinar allowed me to remove the mask and speak freely on my real points of view in this area.

Preparing for and presenting this defence of a testing phase was a challenging and interesting task. As usual, if we’re willing to look past the dogma, there’s usually some useful ideas we can take away from most things. While I disagree that the lengthy, pre-planned, scripted test phases I was often involved in during the early stages of my testing career really offer much value, I think the noise around the “shift left” and “shift right” movements has left a gap in-between where we still need to take pause and allow some humans to interact with the software before unleashing it on customers. (I’ve written about this previously in my blog post, The Power of the Pause.) Thanks to the AST for the opportunity to present at this webinar and give myself a refresher on this particular area of testing.

A recording of this “Steel Yourselves” webinar, along with plenty more awesome content, can be found on the AST YouTube channel.

ER: acting as a Rapid Software Testing Explored “peer advisor” – again! (16-19 January 2023)

It had been almost a year since I first acted as a Peer Advisor for an RST class with Michael Bolton. When Michael reached out to offer the opportunity to participate again, it was an easy decision to join his RST Explored class for the Australia/New Zealand/South East Asia timezones.

The peer advisor role is voluntary and comes with no obligation to attend for any particular duration, so I joined the classes as my schedule allowed. This meant I was in all of the first two days but only briefly during the second two days due to my commitments at SSW. Each afternoon consisted of three 90-minute sessions with two 30-minute breaks.

The class was attended by over 15 students from across Australia, New Zealand and Malaysia. Zoom was used for all of Michael’s main sessions with breakout rooms being used to split the participants into smaller groups for exercises (with the peer advisors roaming these rooms to assist as needed). Asynchronous collaboration was facilitated via a Mattermost instance (an open source Slack clone), which seemed to work well for posing questions to Michael, documenting references, general chat between participants, etc. It would be remiss of me not to call out the remarkable work of Eugenio Elizondo in his role as PA – he was super quick in providing links to resources, etc. as they were mentioned by Michael and he also kept Michael honest with the various administrivia required to run a smooth virtual class.

While I couldn’t commit as much time to the class this time around, I still enjoyed contributing to the exercises by dropping into the breakout rooms to nudge participants along as needed.

As with any class, the participants make all the difference and there were a bunch of very engaged people in this particular class. It was awesome to witness the growth in many of the more engaged folks in such a short time and I hope that even the less vocal participants gained a lot from their attendance. I enjoyed being on the sidelines to see Michael in action and how the participants engaged with his gifted teaching, and I hope I offered some useful advice here and there along the way.

I first participated in RST in 2007 in a chilly Ottawa and have been a huge advocate for this course ever since. The online version is a different beast to the in-person experience but it’s still incredibly valuable and it’s great to see the class becoming accessible to more people via this format. We continue to live in a world of awful messaging and content around testing, with RST providing a shining light and a source of hope for a better future. Check out upcoming RST courses if you haven’t participated yet, they remain the only testing classes that have the Dr Lee stamp of approval!

“The Great Post Office Scandal” (Nick Wallis)

I’ve been following the story of the UK Post Office and its dubious prosecutions of sub-postmasters based on “evidence” of their wrongdoings from its IT system, Horizon, for some years.

My mother worked in the Post Office all of her working life and I also used to work there part-time during school and university holidays. There were no computer terminals on the counters back then; it was all very much paper trail accounting and I remember working on the big ledger when it came to balancing the weekly account every Wednesday afternoon (a process that often continued well into the evening).

Nick Wallis’s book covers the story in incredible detail, describing how the Post Office’s Horizon system (built by Fujitsu under an outsourcing arrangement) was badly managed by both the Post Office and Fujitsu (along with poor Government oversight) and resulted in thousands of innocent people having their lives turned upside down. It is both a moving account of the personal costs shouldered by so many individuals as well as being a reference piece for all of us in IT when it comes to governance, the importance of taking bugs seriously, and having the courage to speak up even if the implications of doing so might be personally difficult.

It’s amazing to think this story might never have been told – and justice never been served – were it not for a few heroes who stepped up, made their voices heard and fought to have the truth exposed. The author’s dedication to telling this story is commendable and he’s done an incredible job of documenting the many travesties that comprise the full awfulness of this sorry tale. This case is yet another example of the truth of Margaret Read’s quote:

Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it’s the only thing that ever has.

One of the more surprising aspects of the story for me was the fact that very complex IT systems like Horizon have been considered in UK law (since 1990) to be “mechanical instruments” and they’re assumed to be working correctly unless shown otherwise. This was a key factor in the data shown by Horizon being trusted over the word of sub-postmasters (many of whom had been in the loyal service of the Post Office in small communities for decades).

Jones wanted the Law Commission’s legal presumption (that ‘in the absence of evidence to the contrary, the courts will presume that mechanical instruments were in order at the material time’ [from 1990]) modified to reflect reality. He told the minister, ‘If people found it difficult to prove a computer was operating reliably in the early 1990s, we can only imagine how difficult it might be to do that today, with the likes of machine-learning algorithms coming to conclusions for reasons even the computer programmer doesn’t understand.’

Darren Jones, chair of the BEIS Select Committee, p. 456 of “The Great Post Office Scandal”

It’s now clear that the complex systems we all build and engage with today (and even back when Horizon was first rolled out) have emergent behaviours that we can’t be predicted. The Post Office’s continued denial that there were any bugs in Horizon (and Fujitsu’s lack of co-operation in providing the evidence to the contrary) seems utterly ridiculous – and it was this denial that allowed so many miscarriages of justice in prosecuting people based on the claimed infallibility of Horizon.

Program testing can be used to show the presence of bugs, but never to show their absence!

Edsger W. Dijkstra

Reading this story really made me think about what the onus on testers is in terms of revealing important problems and advocating for them to be addressed. The tragic cases described in the book illustrate how important it is for testing to be focused on finding important problems in the software under test, not just proving that it passes some big suite of algorithmic checks. Fujitsu, under duress, eventually had to disclose sets of bug reports from the Horizon system and acknowledged that there were known bugs that could have resulted in the balance discrepancies that resulted in so many prosecutions for theft. There are of course much bigger questions to be answered as to why these bugs didn’t get fixed. As a tester raising an issue, there’s only so far you can go in advocating for that issue to be addressed and your ability to do that is highly context-dependent. In this case, even if the testers were doing a great job of finding and raising important problems and advocating for them to be fixed, the toxic swill of Fujitsu, Post Office and government in which everyone was swimming obviously made it very difficult for those problems to get the attention they deserved.

Coming back to my anchors that are the principles of context-driven testing, these seem particularly relevant:

  • People, working together, are the most important part of any project’s context.
  • Projects unfold over time in ways that are often not predictable.
  • Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.

I think part of our job as testers is not only to test the software, but also to test the project and the processes that form the context around our development of the software. Pointing out problems in the project is no easy task, especially in some contexts. But, by bearing in mind cases like the Post Office scandal, maybe we can all find more courage to speak up and share our concerns – doing so could quite literally be the difference between life and death for someone negatively impacted by the system we’re working on.

It would be remiss of me not to mention the amazing work of James Christie in discussing many aspects of the Post Office scandal, bringing his unique experience in both auditing and software testing to dig deep into the issues at hand. I strongly encourage you to read his many blog posts on this story (noting that he has also written an excellent review of the book).

“The Great Post Office Scandal” is available direct from the publisher and the author maintains the Post Office Scandal website to share all the latest news of what is, incredibly, still an ongoing story.

ER: acting as a Rapid Software Testing Explored “peer advisor” (7-10 February 2022)

A relatively rare scheduling of the online version of the Rapid Software Testing Explored course for Australasian timezones presented me with an invitation from presenter Michael Bolton to act as a “peer advisor” for the course running from 7-10 February.

I had already participated in RST twice before, thanks to in-person classes with Michael in Canada back in 2007 and then again with James Bach in Melbourne in 2011, so the opportunity to experience the class online and in its most current form were both very appealing. I was quick to accept Michael’s offer to volunteer for the duration of the course.

While the peer advisor role is voluntary and came with no obligation to attend for any particular duration, I made room in my consulting schedule to attend every session over the four days (with the consistent afternoon scheduling making this a practical option for me). Each afternoon consisted of three 90-minute sessions with two 30-minute breaks, making a total of 18 hours of class time. The class retailed at AU$600 for paying participants so offers incredible value in its virtual format, in my opinion.

As a a peer advisor, I added commentary here and there during Michael’s sessions but contributed more during exercises in the breakout rooms, nudging the participants as required to help them. I was delighted to be joined by Paul Seaman and Aaron Hodder as peer advisors, both testers I have huge respect for and who have made significant contributions to the context-driven testing community. Eugenio Elizondo did a sterling job as PA, being quick to provide links to resources, etc. as well as keeping on top of the various administrivia required to run a smooth virtual class.

The class was attended by over twenty students from across Australia, New Zealand and Malaysia. Zoom was used for all of Michael’s main sessions with breakout rooms being used to split the participants into smaller groups for exercises (with the peer advisors roaming these rooms to assist as needed). Asynchronous collaboration was facilitated via a Mattermost instance (an open source Slack clone), which seemed to work well for posing questions to Michael, documenting references, general chat between participants, etc.

While no two runs of an RST class are the same, all the “classic” content was covered over the four days, including testing & checking, heuristics & oracles, the heuristic test strategy model & product coverage outlines, shallow & deep testing, session-based test management, and “manual” & “automated” testing. The intent is not to cover a slide deck but rather to follow the energy in the (virtual) room and tailor the content to maximize its value to the particular group of participants. This nature of the class meant that even during this third pass through it, I still found the content fresh, engaging and valuable – and it really felt like the other participants did too.

The various example applications used throughout the class are generally simple but reveal complexity (and I’d seen all of them before, I think). It was good to see how other participants dealt with the tasks around testing these applications and I enjoyed nudging them along in the breakouts to explore different ways of thinking about the problems at hand.

The experience of RST in an online format was of course quite different to an in-person class. I missed the more direct and instant feedback from the faces and body language of participants (not everyone decided to have their video turned on either) and I imagine this also makes this format challenging for the presenter. I wondered sometimes whether there was confusion or misunderstanding that lay hidden from obvious view, in a way that wouldn’t happen so readily if everyone was physically present in the same room. Michael’s incredibly rich, subtle and nuanced use of language is always a joy for me, but I again wondered if some of this richness and subtlety was lost especially for participants without English as their first language.

The four hefty afternoons of this RST class passed so quickly and I thoroughly enjoyed both the course itself as well as the experience of helping out in a small way as a peer advisor. It was fun to spend some social time with some of the group after the last session in a “virtual pub” where Michael could finally enjoy a hard-earned beer! The incredible pack of resources sent to all participants is also hugely valuable and condenses so much learned experience and practical knowledge into forms well suited to application in the day-to-day life of a tester.

Since I first participated in RST back in 2007, I’ve been a huge advocate for this course and experiencing the online version (and seeing the updates to its content over the last fifteen years) has only made my opinions even stronger about the value and need for this quality of testing education. In a world of such poor messaging and content around testing, RST is a shining light and a source of hope – take this class if you ever have the chance (check out upcoming RST courses)!

(I would like to publicly offer my thanks to Michael for giving me the opportunity to act as a peer advisor during this virtual RST class – as I hope I’ve communicated above, it was an absolute pleasure!)

The power of the pause

While writing my last blog post, a review of Cal Newport’s “Deep Work” book, I reminded myself of a topic I’ve been meaning to blog about for a while, viz. the power of the pause.

Coming at this from a software development perspective, I mentioned in the last blog post that:

“There seems to be a new trend forming around “deployments to production” as being a useful measure of productivity, when really it’s more an indicator of busyness and often comes as a result of a lack of appetite for any type of pause along the pipeline for humans to meaningfully (and deeply!) interact with the software before it’s deployed.”

I often see this goal of deploying every change directly (and automatically) to production without the goal being accompanied by compelling reasons for doing so – apart from maybe “it’s what <insert big name tech company here> does”, even though you’re likely nothing like those companies in most other important ways. What’s the rush? While there are some cases where a very quick deployment to production is of course important, the idea that every change needs to be deployed in the same way is questionable for most organizations I’ve worked with.

Automated deployment pipelines can be great mechanisms for de-risking the process of getting updated software into production, removing opportunities for human error and making such deployments less of a drama when they’re required. But, just because you have this mechanism at your disposal, it doesn’t mean you need to use it for each and every change made to the software.

I’ve seen a lot of power in pausing along the deployment pipeline to give humans the opportunity to interact with the software before customers are exposed to the changes. I don’t believe we can automate our way out of the need for human interaction for software designed for use by humans, but I’m also coming to appreciate that this is increasingly seen as a contrarian position (and one I’m happy to hold). I’d ask you to consider whether there is a genuine need for automated deployment of every change to production in your organization and whether you’re removing the opportunity to find important problems by removing humans from the process.

Taking a completely different perspective, I’ve been practicing mindfulness meditation for a while now and haven’t missed a daily practice since finishing up full-time employment back in August 2020. One of the most valuable things I’ve learned from this practice is the idea of putting space between stimulus and response – being deliberate in taking pause.

Exploring the work of Gerry Hussey has been very helpful in this regard and he says:

The things and situations that we encounter in our outer world are the stimulus, and the way in which we interpret and respond mentally and emotionally to that stimulus is our response.

Consciousness enables us to create a gap between stimulus and response, and when we expand that gap, we are no longer operating as conditioned reflexes. By creating a gap between stimulus and response, we create an opportunity to choose our response. It is in this gap between stimulus and response that our ability to grow and develop exists. The more we expand this gap, the less we are conditioned by reflexes and the more we grow our ability to be defined not by happens to us but how we choose to respond.

Awaken Your Power Within: Let Go of Fear. Discover Your Infinite Potential. Become Your True Self (Gerry Hussey)

I’ve found this idea really helpful in both my professional and personal lives. It’s helped with listening, to focus on understanding rather than an eagerness to simply respond. The power of the pause in this sense has been especially helpful in my consulting work as it has a great side effect of lowering the chances of jumping into solution mode before fully understanding the problem at hand. Accepting the fact that things will happen outside my control in my day to day life but that I have the choice in how to respond to whatever happens has been transformational.

Inevitably, there are still times where my response to stimuli is quick, conditioned and primitive (with system 1 thinking doing its job) – and sometimes not kind. But I now at least recognize when this has happened and bring myself back to what I’ve learned from regular practice so as to continue improving.

So, whether it’s thinking specifically about software delivery pipelines or my interactions with the world around me, I’m seeing great power in the pause – and maybe you can too.

Is talking about “scaling” human testing missing the point?

I recently came across an article from Adam Piskorek about the way Google tests its software.

While I was already familiar with the book How Google Tests Software (by James Whittaker, Jason Arbon et al, 2012), Adam’s article introduced another newer book about how Google approaches software engineering more generally, Software Engineering at Google: Lessons Learned from Programming Over Time (by Titus Winters, Tom Manshreck & Hyrum Wright, 2020).

The following quote in Adam’s article is lifted from this newer book and made me want to dive deeper into the book’s broader content around testing*:

Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clean answer: automation.

Chapter 11 (Testing Overview), p210 (Adam Bender)

I was stunned by this quote from the book. It felt like they were saying that development simply goes too quickly for adequate testing to be performed and also that automation is seen as the silver bullet to moving as fast as they desire while maintaining quality, without those pesky slow humans interacting with the software they’re pushing out.

But, in the interests of fairness, I decided to study the four main chapters of the book devoted to testing to more fully understand how they arrived at the conclusion in this quote – Chapter 11 which offers an overview of the testing approach at Google, chapter 12 devoted to unit testing, chapter 13 on test doubles and chapter 14 on “Larger Testing”. The book is, perhaps unsurprisingly, available to read freely on Google Books.

I didn’t find anything too controversial in chapter 12, rather mostly sensible advice around unit testing. The following quote from this chapter is worth noting, though, as it highlights that “testing” generally means automated checks in their world view:

After preventing bugs, the most important purpose of a test is to improve engineers’ productivity. Compared to broader-scoped tests, unit tests have many properties that make them an excellent way to optimize productivity.

Chapter 13 on test doubles was similarly straightforward, covering the challenges of mocking and giving decent advice around when to opt for faking, stubbing and interaction testing as approaches in this area. Chapter 14 dealt with the challenges of authoring tests of greater scope and I again wasn’t too surprised by what I read there.

It is chapter 11 of this book, Testing Overview (written by Adam Bender), that contains the most interesting content in my opinion and the remainder of this blog post looks in detail at this chapter.

The author says:

since the early 2000s, the software industry’s approach to testing has evolved dramatically to cope with the size and complexity of modern software systems. Central to that evolution has been the practice of developer-driven, automated testing.

I agree that the general industry approach to testing has changed a great deal in the last twenty years. These changes have been driven in part by changes in technology and the ways in which software is delivered to users. They’ve also been driven to some extent by the desire to cut cost and it seems to me that focusing more on automation has been seen (misguidedly) as a way to reduce the overall cost of delivering software solutions. This focus has led to a reduction in the investment in humans to assess what we’re building and I think we all too often experience the results of that reduced level of investment.

Automated testing can prevent bugs from escaping into the wild and affecting your users. The later in the development cycle a bug is caught, the more expensive it is; exponentially so in many cases.

Given the perception of Google as a leader in IT, I was very surprised to see this nonsense about the cost of defects being regurgitated here. This idea is “almost entirely anecdotal” according to Laurent Bossavit in his excellent The Leprechauns of Software Engineering book and he has an entire chapter devoted to this particular mythology. I would imagine that fixing bugs in production for Google is actually inexpensive given the ease with which they can go from code change to delivery into the customer’s hands.

Much ink has been spilled about the subject of testing software, and for good reason: for such an important practice, doing it well still seems to be a mysterious craft to many.

I find the choice of words here particularly interesting, describing testing as “a mysterious craft”. While I think of software testing as a craft, I don’t think it’s mysterious although my experience suggests that it’s very difficult to perform well. I’m not sure whether the wording is a subtle dig at parts of the testing industry in which testing is discussed in terms of it being a craft (e.g. the context-driven testing community) or whether they are genuinely trying to clear up some of the perceived mystery by explaining in some detail how Google approaches testing in this book.

The ability for humans to manually validate every behavior in a system has been unable to keep pace with the explosion of features and platforms in most software. Imagine what it would take to manually test all of the functionality of Google Search, like finding flights, movie times, relevant images, and of course web search results… Even if you can determine how to solve that problem, you then need to multiply that workload by every language, country, and device Google Search must support, and don’t forget to check for things like accessibility and security. Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clear answer: automation

(note: bold emphasis is mine)

We then come to the source of the quote that first piqued my interest. I find it interesting that they seem to be suggesting the need to “test everything” and using that as a justification for saying that using humans to interact with “everything” isn’t scalable. I’d have liked to see some acknowledgement here that the intent is not to attempt to test everything, but rather to make skilled, risk-based judgements about what’s important to test in a particular context for a particular mission (i.e. what are we trying to find out about the system?). The subset of the entire problem space that’s important to us is something we can potentially still ask humans to interact with in valuable ways. The “one clear answer” for testing being “automation” makes little sense to me, given the well-documented shortcomings of automated checks (some of which are acknowledged in this same book) and the different information we should be looking to gather from human interactions with the software compared to that from algorithmic automated checks.

Unlike the QA processes of yore, in which rooms of dedicated software testers pored over new versions of a system, exercising every possible behavior, the engineers who build systems today play an active and integral role in writing and running automated tests for their own code. Even in companies where QA is a prominent organization, developer-written tests are commonplace. At the speed and scale that today’s systems are being developed, the only way to keep up is by sharing the development of tests around the entire engineering staff.

Of course, writing tests is different from writing good tests. It can be quite difficult to train tens of thousands of engineers to write good tests. We will discuss what we have learned about writing good tests in the chapters that follow.

I think it’s great that developers are more involved in testing than they were in the days of yore. Well-written automated checks provide some safety around changing product code and help to prevent a skilled tester from wasting their time on known “broken” builds. But, again, the only discussion that follows in this particular book (as promised in the last sentence above) is about automation and not skilled human testing.

Fast, high-quality releases
With a healthy automated test suite, teams can release new versions of their application with confidence. Many projects at Google release a new version to production every day—even large projects with hundreds of engineers and thousands of code changes submitted every day. This would not be possible without automated testing.

The ability to get code changes to production safely and quickly is appealing and having good automated checks in place can certainly help to increase the safety of doing so. “Confidence” is an interesting choice of word to use around this (and is used frequently in this book), though – the Oxford dictionary definition of “confidence” is “a feeling or belief that one can have faith in or rely on someone or something”, so the “healthy automated test suite” referred to here appears to be one that these engineers feel comfortable to rely on enough to say whether new code should go to production or not.

The other interesting point here is about the need to release new versions so frequently. While it makes sense to have deployment pipelines and systems in place that enable releasing to production to be smooth and uneventful, the desire to push out changes to customers very frequently seems like an end in itself these days. For most testers in most organizations, there is probably no need or desire for such frequent production changes so deciding testing strategy on the perceived need for these frequent changes could lead to goal displacement – and potentially take an important aspect of assessing those changes (viz. human testers) out of the picture altogether.

If test flakiness continues to grows you will experience something much worse than lost productivity: a loss of confidence in the tests. It doesn’t take needing to investigate many flakes before a team loses trust in the test suite, After that happens, engineers will stop reacting to test failures, eliminating any value the test suite provided. Our experience suggests that as you approach 1% flakiness, the tests begin to lose value. At Google, our flaky rate hovers around 0.15%, which implies thousands of flakes every day. We fight hard to keep flakes in check, including actively investing engineering hours to fix them.

It’s good to see this acknowledgement of the issues around automated check stability and the propensity for unstable checks to lead to a collapse in trust in the entire suite. I’m interested to know how they go about categorizing failing checks as “flaky” to be included in their overall 0.15% “flaky rate”, no doubt there’s some additional human effort involved there too.

Just as we encourage tests of smaller size, at Google, we also encourage engineers to write tests of narrower scope. As a very rough guideline, we tend to aim to have a mix of around 80% of our tests being narrow-scoped unit tests that validate the majority of our business logic; 15% medium-scoped integration tests that validate the interactions between two or more components; and 5% end-to-end tests that validate the entire system. Figure 11-3 depicts how we can visualize this as a pyramid.

It was inevitable during coverage of automation that some kind of “test pyramid” would make an appearance! In this case, they use the classic Mike Cohn automated test pyramid but I was shocked to see them labelling the three different layers with percentages based on test case count. By their own reasoning, the tests in the different layers are of different scope (that’s why they’re in different layers, right?!) so counting them against each other really makes no sense at all.

Our recommended mix of tests is determined by our two primary goals: engineering productivity and product confidence. Favoring unit tests gives us high confidence quickly, and early in the development process. Larger tests act as sanity checks as the product develops; they should not be viewed as a primary method for catching bugs.

The concept of “confidence” being afforded by particular kinds of checks arises again and it’s also clear that automated checks are viewed as enablers of productivity.

Trying to answer the question “do we have enough tests?” with a single number ignores a lot of context and is unlikely to be useful. Code coverage can provide some insight into untested code, but it is not a substitute for thinking critically about how well your system is tested.

It’s good to see context being mentioned and also the shortcomings of focusing on coverage numbers alone. What I didn’t really find anywhere in what I read in this book was the critical thinking that would lead to an understanding that humans interacting with what’s been built is also a necessary part of assessing whether we’ve got what we wanted. The closest they get to talking about humans experiencing the software in earnest comes from their thoughts around “exploratory testing”:

Exploratory Testing is a fundamentally creative endeavor in which someone treats the application under test as a puzzle to be broken, maybe by executing an unexpected set of steps or by inserting unexpected data. When conducting an exploratory test, the specific problems to be found are unknown at the start. They are gradually uncovered by probing commonly overlooked code paths or unusual responses from the application. As with the detection of security vulnerabilities, as soon as an exploratory test discovers an issue, an automated test should be added to prevent future regressions.

Using automated testing to cover well-understood behaviors enables the expensive and qualitative efforts of human testers to focus on the parts of your products for which they can provide the most value – and avoid boring them to tears in the process.

This description of what exploratory testing is and what it’s best suited to are completely unfamiliar to me, as a practitioner of exploratory testing for fifteen years or so. I don’t treat the software “as a puzzle to be broken” and I’m not even sure what it would mean to do so. It also doesn’t make sense to me to say “the specific problems to be found are unknown at the start”, surely this applies to any type of testing? If we already know what the problems are, we wouldn’t need to test to discover them. My exploratory testing efforts are not focused on “commonly overlooked code paths” either, in fact I’m rarely interested in the code but rather the behaviour of the software experienced by the end user. Given that “exploratory testing” as an approach has been formally defined for such a long time (and refined over that time), it concerns me to see such a different notion being labelled as “exploratory testing” in this book.

TL;DRs
Automated testing is foundational to enabling software to change.
For tests to scale, they must be automated.
A balanced test suite is necessary for maintaining healthy test coverage.
“If you liked it, you should have put a test on it.”
Changing the testing culture in organizations takes time.

In wrapping up chapter 11 of the book, the focus is again on automated checks with essentially no mention of human testing. The scaling issue is highlighted here also, but thinking solely in terms of scale is missing the point, I think.

The chapters of this book devoted to ‘testing” in some way cover a lot of ground, but the vast majority of that journey is devoted to automated checks of various kinds. Given Google’s reputation and perceived leadership status in IT, I was really surprised to see mention of the “cost of change curve” and the test automation pyramid, but not surprised by the lack of focus on human exploratory testing.

Circling back to that triggering quote I saw in Adam’s blog (“Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale”), I didn’t find an explanation of how they do in fact assess product quality – at least in the chapters I read. I was encouraged that they used the term “assess” rather than “measure” when talking about quality (on which James Bach wrote the excellent blog post, Assess Quality, Don’t Measure It), but I only read about their various approaches to using automated checks to build “confidence”, etc. rather than how they actually assess the quality of what they’re building.

I think it’s also important to consider your own context before taking Google’s ideas as a model for your own organization. The vast majority of testers don’t operate in organizations of Google’s scale and so don’t need to copy their solutions to these scaling problems. It seems we’re very fond of taking models, processes, methodologies, etc. from one organization and trying to copy the practices in an entirely different one (the widespread adoption of the so-called “Spotify model” is a perfect example of this problem).

Context is incredibly important and, in this particular case, I’d encourage anyone reading about Google’s approach to testing to be mindful of how different their scale is and not use the argument from the original quote that inspired this post to argue against the need for humans to assess the quality of the software we build.

* It would be remiss of me not to mention a brilliant response to this same quote from Michael Bolton – in the form of his 47-part Twitter thread (yes, 47!).

Common search engine questions about testing #10: “What will software testing look like in 2021?”

This is the final part of a ten-part blog series in which I’ve answered some of the most common questions asked about software testing, according to search engine autocomplete results (thanks to Answer The Public).

In this last post, I ponder the open question of “What will software testing look like in 2021?” (note: updated the year from 2020 in my original dataset from Answer The Public to 2021).

The reality for most people involved in the software testing business is that testing will look pretty much the same in 2021 as it did in 2020 – and probably as it did for many of the years before that too. Incremental improvements take time in organisations and the scope & impact of such changes will vary wildly between different organisations and even within different parts of the same organisation.

I fully expect 2021 to yield a number of reports about trends in software testing and quality, akin to Capgemini’s annual World Quality Report (which I critiqued again last year). There will probably be a lot of noise around the application of AI and machine learning to testing, especially from tool vendors and the big consultancies.

I feel certain that automation (especially of the “codeless” variety) will continue to be one of the main threads around testing with companies continuing to recruit on the basis of “automated testing” prowess over exploratory testing skills.

I think a small but dedicated community of people genuinely interested in advancing the craft of software testing will continue to publish their ideas and look to inject some reality into the various places that testing gets discussed online.

My daily meditation practice has applications here too. In the same way that the practice helps me to recognise when thoughts are happening without getting caught up in their storyline, I think you should make an effort to observe the inevitable commentary on trends in the testing industry through 2021 without going out of your way to follow them. These trends are likely to change again next year and expending effort trying to keep “on trend” is likely effort better spent elsewhere. Instead, I would recommend focusing on the fundamentals of good software testing, while continuing to demonstrate the value of good testing and advancing the practice as best you can in the context of your organisation.

I would also encourage you to make 2021 the year that you tell your testing stories for the benefit of the wider community – your stories are unique, valuable and a great way for others to learn what’s really going on in our industry. There are many avenues to share your first-person experiences – blog about them, share them as LinkedIn articles, talk about them at meetups or present them at a conference (many of which seem destined to remain as virtual events through 2021, which I see as a positive in terms of widening the opportunity for more diverse stories to be heard).

For some alternative opinions on what 2021 might look like, check out the responses to the recent question “What trends do you think will emerge for testing in 2021?” posed by Ministry of Testing on LinkedIn.

You can find the previous nine parts of this blog series at:

I’ve provided the content in this blog series as part of the “not just for profit” approach of my consultancy business, Dr Lee Consulting. If the way I’m writing about testing resonates with you and you’re looking for help with the testing & quality practices in your organisation, please get in touch and we can discuss whether I’m the right fit for you.

I’m grateful to Paul Seaman and Ky who acted as reviewers for every part of this blog series; I couldn’t have completed the series without their help, guidance and encouragement along the way, thank you!

Thanks also to all those who’ve amplified the posts in this series via their blogs, lists and social media posts – it’s been much appreciated. And, last but not least, thanks to Terry Rice for the underlying idea for the content of this series.

Common search engine questions about testing #9: “Which software testing certification is the best?”

This is the penultimate part of a ten-part blog series in which I will answer some of the most common questions asked about software testing, according to search engine autocomplete results (thanks to Answer The Public).

In this post, I answer the question “Which software testing certification is the best?“.

There has been much controversy around certification in our industry for a very long time. The certification market is dominated by the International Software Testing Qualifications Board (ISTQB), which they describe as “the world’s most successful scheme for certifying software testers”. The scheme arose out of the British Computer Society’s ISEB testing certification in the late 1990s and has grown to become the de facto testing certification scheme. With a million-or-so exams administered and 700,000+ certifications issued, the scheme has certainly been successful in dishing out certifications across its ever-increasing range of offerings (broadly grouped into Agile, Core and Specialist areas).

In the interests of disclosure, I am Foundation certified by the ANZTB and I encouraged all of the testers at Quest in the early-mid 2000s to get certified too. At the time, it felt to me like this was the only certification that gave a stamp of professionalism to testers. After I received education from Michael Bolton during Rapid Software Testing in 2007, I soon realised the errors in my thinking – and then put many of the same testers through RST with James Bach a few years later!

Although the ISTQB scheme has issued many certifications, the value of these certifications is less clear. The lower level certifications, particularly Foundation, are very easy to obtain and require little to no practical knowledge or experience in software testing. It’s been disappointing to witness how this de facto simple certification became a pre-requisite for hiring testers all over the world. The requirement to be ISQTB-certified doesn’t seem to crop up very often on job ads in the Australian market now, though, so maybe its perceived value is falling over time.

If your desire is to become an excellent tester, then I would encourage you to adopt some of the approaches to learning outlined in the previous post in this series. Following a path of serious self-learning about the craft (and maybe challenging yourself with one of the more credible training courses such as BBST or RST) is likely to provide you with much more value in the long-term than ticking the ISTQB certification box. If you’re concerned about your resume “making the cut” when applying for jobs without having ISTQB certification, consider taking Michael Bolton’s advice in No Certification, No Problem!

Coming back to the original question. Imagine what the best software testing certification might be if you happen to be a for-profit training provider for ISTQB certifications. Then think about what the best software testing certification might be if you’re a tester with a few years of experience in the industry looking to take your skills to the next level. I don’t think it makes sense to ask which (of anything) is the “best” as there are so many context-specific factors to consider.

The de facto standard for certification in our industry, viz. ISTQB, is not a requirement for you to become an excellent and credible software tester, in my opinion.

If you’re interested in a much fuller treatment of the issues with testing certifications, I think James Bach has covered all the major arguments in his blog post, Against Certification. Ilari Henrik Aegerter’s short Super Single Slide Sessions #6 – On Certifications video is also worth a look and, for some light relief around this controversial topic, see the IQSTD website!

You can find the first eight parts of this blog series at:

I’m providing the content in this blog series as part of the “not just for profit” approach of my consultancy business, Dr Lee Consulting. If the way I’m writing about testing resonates with you and you’re looking for help with the testing & quality practices in your organisation, please get in touch and we can discuss whether I’m the right fit for you.

Thanks again to my review team (Paul Seaman and Ky) for their helpful feedback on this post, their considerable effort and input as this series comes towards an end has been instrumental in producing posts that I’m proud of.