Best of the Test: April 2016

Tuesday 12 April 2016

Automation Confusion

https://upload.wikimedia.org/wikipedia/commons/c/cf/A340-642_flight_deck.jpg

Introduction

I always feel like this is a weird thing to admit (as it seems a bit morbid), but I love the Canadian documentary series ‘Aircrash Investigation’ (also known as ‘Mayday’). I think I like it because it obviously heavily involves the process of investigation and depicts crash investigators leaving no stone unturned nor making any assumptions about causation. There are a lot of common themes with software testing, not to mention the fascinating psychological analysis of humans under pressure. I recently watched an episode that highlighted such an analysis, particularly with regards to the psychology of humans interacting with machines.

Learning from tragedy

The episode in question focused on the crash of Asiana Airlines Flight 214 which resulted in the loss of three lives and one hundred and eighty-seven injured people. The USA’s NTSB (National Transportation Safety Board) investigation determined that the crash was caused by the pilots’ poor mental model of the automation that helps them fly the plane. The documentary coined a great term that I really liked to describe this - ‘automation confusion’. In short, the pilots were trained to rely on the automation so heavily that when they were presented with an unexpected situation, they couldn’t react in time and didn’t notice the signs around them that something was wrong. The automation couldn’t help the pilots due to the unexpected and unique actions they took in this situation. As part of the NTSB’s report, they raised concerns that the automated systems on planes are becoming so complex that it isn’t possible for pilots to fully understand and predict the behaviour of the aircraft in all situations. The documentary ended on a note of how the flight industry was discussing ‘going back to basics’ and focusing on training pilots to fly manually so that the over-reliance on automation could be avoided in future.

What does this have to do with testing?

I found this fascinating because I recognise a lot of parallels with current discussions in the testing community about automation and the concerns about over-reliance on automation tests. The concern that humans find it difficult to keep up and understand complex automated systems also reminds me a lot about the relationship between programmers and their code. Is it possible for any programmer to understand 100% of their code, the libraries they use, the languages they use? Do they understand every possible permutation, no matter the situation or user-input? Will the automation always save the human when things go wrong? How does the automation know what ‘wrong’ is without it being told?

I think we deal with ‘automation confusion’ all of the time in software development and I think as testers it serves us well to be aware of this problem.

A concern for the future?

As we appear to be moving to greater amounts of automation in software development, I think we should be looking out for this problem. With DevOps and the ideas of continuous delivery and continuous deployment becoming ever popular, we are building more and more automated systems to help us in our day-to-day work. But for each automated system we build to make things faster and easier, we also hide complexity from ourselves. In order to automate each system, each process - we potentially base this automation on a simplistic mental model of those systems or processes.

There are also two sides to this:

The creators of the automation crafting a potentially simplistic system to replace a more complex manual system.
The users of the automation having a simplistic mental model of how the automated system works and how smart it is.

How testing can help

I think my feeling on this is a sense of justification for raising concerns about the eager adoption of automation in all kinds of areas. I think testing can help discover these poor mental models and help improve not only the quality of these automated systems, but also improve the way people use these systems. I think we can do this by:

Challenging the decision to automate a system - why are we doing it? Do we understand the effects automating it will have? Are we automating the right things?
Testing these automated systems with a focus on usage - could we be creating user stories based on this scenario of over-reliance on the automation to handle situations?
We could therefore focus on understanding the psychology of end users and how their behaviour changes when a manual task is replaced with an automated one. Perhaps in the same way that people have an unreasonable belief of existence ‘perfect software’, maybe they also consistently believe automated systems are smarter than they really appear.

Wednesday 6 April 2016

NWEWT #1 Regression Testing

Introduction

Last weekend I attended my first ever peer conference, which was the North West Exporatory Workshop on Testing (NWEWT). This conference was organised by Duncan Nisbet with help from the Association for Software Testing (AST). I’d met Duncan at the Liverpool meetup for the North West Tester Gathering after he had presented a great talk on ‘Testers! Be more salmon!’. Shortly after that meetup, he contacted me asking if I was interested in attending this brand new peer conference. After mulling it over, I accepted it for two reasons:

Why not? (pretty much the main reason I’ve done a lot of things!)
It sounded like an opportunity to learn by deep, critical thinking - I liked the challenge of presenting my ideas and having people really critically analyse how I think about testing. If nothing else I was going to get a lot out of it by forcing myself to think about a subject deeply in preparation!

Attendees

The attendees were as follows, the content of this blog post should be attributed to their input as much as mine, the thoughts I have here were brought together through collaboration:

Ash Winter

Callum Hough

Christina Ohanian

Dan Ashby

Duncan Nisbet

Emma Preston

Gwen Diagram

Joep Schuurkes

Jean-Paul Varwijk

Liam Gough

Lim Sim

Richard Bradshaw

Simon Peter Schriver

Toby Sinclair

Tom Heald

Theme

The main theme for this conference was ‘Regression Testing’, specifically what we loved or loathed about it, whether we even do it, whether we should automate it and just generally what our experience and thoughts were.

What the hell is a ‘peer conference’?

I had no idea before I went! I researched around and I knew it was following a format whereby deep discussion and debate were encouraged between professionals, but as with many things, I didn’t really understand it until I started doing it.

Basically, there were 15 or so of us gathered, each bringing our own ‘experience report’ on the theme or topic for the conference. We each took turns in presenting our experience report through slides, flipchart or just simply talking. We then held a Q&A session where the real action started.

The Q&A session featured green, yellow and red cards. People were only allowed to talk when indicated to do so by the facilitator. If people wished to ask a question or contribute to the discussion, they held up one of the cards which had the following uses:

The green card indicated to the facilitator that you would like to ask a new question or start a new thread based on the current discussion. So at the start, everyone would show green cards because there was no thread yet.
The yellow card indicated to the facilitator that you would like to ask a further question or talk about the current thread. This is how the discussions got deeper and deeper into particular threads.
The red card indicated to the facilitator that you felt the current discussion needed to stop or that you think a ‘fact’ being stated by another person was wrong. We didn’t see a lot of use for this card, only really once or twice for when particular threads went too long or something needed clarifying. Red cards can only be used for situations that the facilitator feels are genuine, so they can be taken off people if they are abused.

Typically, the discussions were mainly between the presenter of the experience report and the person asking the question. However, they could shift to a discussion between two other people - when you showed a yellow card you could directly challenge the person who had caused you to raise the card.

On a personal note

I actually think one of my biggest takeaways was cementing the feeling that meetups and conferences are not as scary as they might seem. What I mean by that is it's easy to feel like members of the community who are quite out-spoken or actively involved are not approachable. I’ve thought about this a lot recently and I think for me it comes from the assumption that because people are experienced, they know everything I know and have already come to the same conclusions. At the start of this year, I felt like I didn’t have anything new to add and more experienced people have taken their thoughts to a more advanced level. I guess I also felt that well-known people must get a lot of questions from their public position, so I naturally feel like leaving them alone, especially if I think my questions or thoughts are less developed than theirs.

So if you’re reading this and feel the same way about meeting testers and asking people questions, then fight those thoughts! My experience at all of the testing events I’ve attended so far is that everyone is more than happy to take the time to listen to you and help you! The key here is that you’re open to suggestions and ideas from other people and this is one element of my “oh god I don’t know anything” thought process I’d like to keep - I’d like to remain humble. But don’t be afraid to ask the questions and approach people!

Takeaways

Enough about my personal development, what about the content? Well I think the biggest takeaway that I think everyone agreed upon was that ‘regression testing’ is a phrase that is poorly defined and definitely not consistently used in our industry. I found myself agreeing a lot with Joep’s idea of not even talking about it. He suggested that instead of using ‘regression testing’ we could just talk about whatever we are doing, e.g. “I’m performing these tests to find out if these two systems are functioning as we desire”. However, that doesn’t mean we should never use the phrase, it may be that we feel within a particular circle of people or within a company, there is very clear understanding. The point is more about being aware that phrases such as this may not be as clearly understood as people might think and can even be used to not think about what you are doing. Again, Joep gave a funny example of it being a ‘jedi mind trick’ whereby managers are told “we’re regression testing!” to which they respond - “great” and walk away.

Several people also shared their different approaches to regression testing. Richard shared his F.A.R.T. Model which I had seen before and Ash also shared his own, similar model for exploring the large unknowns of systems. Toby took a different approach and discussed the idea of ‘regression in testing’ - the idea that your skills and knowledge regress and what we might do to try and combat that.

One of the best parts of attending events like this is learning how exactly people conduct testing within their companies and the different situations and problems they are having to deal with. Christina, Simon and Tom all shared different situations that I think generated a lot of useful discussion and debate, they definitely gave me plenty to think about in terms of considering how I would approach those situations. Richard gave a particularly useful piece of advice that I really love - which I can only paraphrase as ‘don’t focus on the politics, make sure you’re still doing a good job first and foremost at all times’. This really struck a chord for me personally as I have experienced some very political situations that I haven’t agreed with, but I personally value my own professionalism to still deliver good work despite this.

Another idea that stuck in my head (unfortunately a lot of our notes were binned by the hotel staff on the second day so I’m stuck with just a few notes and my memories!) was Jean-Paul’s idea of using the phrase ‘continuous testing’ to help highlight the need to still perform manual testing throughout ‘continuous delivery’ pipelines - in other words combat the feeling that continuous delivery leads to people forgetting about testing. However, we did also discuss that potentially this could have the opposite effect where people treat testing as a separate concern because we are using a separate phrase for it.

In summary, there was a nice mix of ideas and approaches that I felt I could apply to my work or in future as well as a lot of food for thought. I feel like there are some threads that left me with even more questions - I’m guessing this is normal for these things! Unfortunately, I think a lot of the attendees had a lot of similar points of view, so we ended up agreeing on a lot of topics without much debate. However, I think I still learnt a great deal and it was useful to find a lot of validation in my current line of thought on this topic.

My experience report

Not everyone got a chance to share their experience report but I was lucky enough to be one of the chosen! I’ve written up my views on regression testing here:

What do we mean by regression testing?

Summary

I really enjoyed my first experience of a peer conference a lot and it left me wanting more. I really liked the chance to start digging deep into a topic. It was also nice to find a lot of validation of my own ideas on regression testing and to learn new ideas and approaches from other people.

What do we mean by regression testing?

Introduction

I recently attended a peer conference where I presented my experience report on regression testing. I’m going to go into the detail of what I talked about in this post - if you’d like to read more about the peer conference and what on earth a ‘peer conference’ even is, read here:
NWEWT #1 Regression Testing

Disclaimer

I have no data to back this up, before I go any further I’d like to highlight that. What I state here is my opinion based on observations and my interpretation of people’s words and actions. I’ve not really thought about how I might go about measuring and collecting data to back up my words here, but I’m going to bear it in mind for future topics!

Feelings of mis-use

For the last 6 years of my testing career I’ve never been formally trained on testing. I don’t have an ISTQB qualification and a lot of what I know about testing I have either been informally trained or I have learnt myself from observing others. I therefore have a definition of ‘regression testing’ that I have learnt from how others use it.

Before I go into what I personally define this phrase to mean, I’ve noticed ‘regression testing’ referred to as:

“The testing we perform at the end of projects, where we re-run all of our tests and make sure the code hasn’t broken anything.”

This kind of understanding also seemed to crop up and become twisted with statements like:

“I’d like to press a button and run all of the regression tests automatically.”

“Why didn’t your regression tests find this?”

After being invited to the peer conference, I decided to look up the wikipedia definition of ‘regression testing’. I went with wikipedia simply because it’s likely to be a commonly used source. I actually found that wikipedia’s definition fits my own definition fairly well:

“Regression testing is a type of software testing that verifies that software that was previously developed and tested still performs correctly after it was changed or interfaced with other software.”

Combining this definition with my recent understanding that ‘testing’ is about learning, I think this fits with my own take on ‘regression testing’ which would be:

“Regression testing is considering the risk of change”.

Comparing my own definition with how people seem to use the phrase, I realised people seemed to define regression testing with the attributes of:

Repeating tests and therefore being highly automated.
Being performed late in a project, at the end of a sprint or a waterfall. Usually it seemed like it was considered an activity to be performed after development activities.

This seemed totally wrong to me, and I started thinking about why I felt that.

Why does this matter?

Well the first question I felt I needed to answer is - why does it matter people have these differing definitions? Well, this is what I could think of:

If regression testing is only performed at ‘the end’, does that mean we only consider the risk of change at the end?
If regression testing is an activity to be carried out after development activities, how invested are non-testers in the results?
If we are only performing regression testing at the end, and therefore only considering it later, how do we effectively identify risks?
Assumptions are being made about how valuable it is to repeat tests and the cost of executing them.

In Practice

In my opinion, everyone involved with software development is considering the risk of change all of the time even if only subconsciously. Typically, we’re nearly always changing the software and hence discussing what the desired behaviour is. However, for me the danger of the phrase ‘regression testing’ and how it seems to be commonly used is that we’re leaving the bulk of this kind of critical thinking to the end of projects. Not to mention over-estimating the value of simply repeating tests over and over.

Could we be finding problems with change earlier? When it’s cheaper to both identify and act upon?

What if we are developing a component that is part of a larger system, are we not considering the risk of change as we develop it in isolation? As we start integrating it? Is it wise to only to consider the greater implications of a change later on?

What if we’re developing a change to the red component, are we only going to consider its integration with the blue component later? What about the effects of this change to the components that are not directly connected like the green one?

With git branching, where do you perform ‘regression testing’ here? Who performs the testing? How do they identify when and what to test? When is ‘at the end’? What tests would you run ‘at the end’? Are you only repeating tests? Or running new tests?

These questions are hard to answer when you don’t have all of the information available to you, with these git branches you may not know what was changed as part of each branch - you may not even know how many branches there are and when they were merged. The developer who is writing each change and merging it knows this, but do they identify the risks? Or do they ‘leave it for regression testing’?

My conclusions

I’m concerned that ‘regression testing’ is being referred to as a standard testing phase, different to the testing carried out as part of development activities. This seems wrong to me because it encourages the ‘over the fence’ mentality of ‘coders code, testers test’. If we are to effectively identify the risks of change, we need to work together as a team, not as separate roles.

I believe regression testing is a continuous activity that should be performed as soon as possible. The reason it may have to be performed ‘at the end’ is not because it should be, but because that is the earliest possible point. I currently work in an environment which desires a ‘lean agile’ approach (speed) at the same time as developing a microservices architecture (decoupled). Both of these require us to become smarter with our testing, we don’t have time to run large numbers of repeated tests just before releasing only to find out a flaw in our work that could have been realised with some greater collaboration earlier on.
Finally, regression testing is not an automatable activity - it is the process of consideration and analysis, not repetition. Repeating a test may be useful to help us evaluate risk, but sometimes we may want to try a totally new test. Regression testing isn’t just repeating all of the tests that were run before - as the risk changes, so do your tests. Your testing needs to be continuously adapted to this changing risk.