Sunday, 4 October 2015

Measuring success in Testing

I'm a strong believer in continual improvement of our practices
Recently we've been focussing on re-invigorating this attitude in our testers.
Most of this has been explaining that...
  • "what we do" now has resulted in a continual evolution over many years.
  • we don't believe in 'best practise' - as it implies you're not prepared to accept that another practice may be better.
  • we are open to trying new ideas and ways to do things.
When talking to test analysts about evolution, and trying new things - I started to think "what are we aiming for?" - how do we know the evolution is positive? how do we know the new idea made a positive difference?
So, I asked a couple of Test Analysts, Business Analysts, and Product Owners: how do we measure the success of our testing?
Below is a digest of some scribbles I kept in my notebook of their answers, and the issues that they (or I) felt existed in each type of success measure. Then I've included my personal view of how I think the success of our testing should be measured.
I'd be keen to hear people's comments.
Scribbles digested
  1. Defect rate
    • Amount of defects found during testing
      • Rationale - if we find lots of defects during the testing activities, we've stopped these making their way into production. Meaning we have a better quality product
    • Number of bugs released to production
      • Rationale - if we don't release any bugs to production, we must be testing all the possible areas
    • No severity 1 defects released
      • Rationale - we make sure the worst bugs don't make it to production
    • Issues
      • How much do those defects you find matter? You can almost always find bugs in any system, but do they matter to the product?
      • "Severity" require a judgement call by someone. If you release no severity 1 defects, and the product fails and no one uses it, you probably weren't assessing severity properly. So was your testing successful?
      • Just because we don't see bugs, doesn't mean they're not there.
        Alternatively, the code might not have been particularly buggy when it was written. So was the success that of the testing or the coding?
  2. Time
    • Time for testing to complete
      • Rationale - the faster something is deployed the better. So if we complete testing quickly, that's success
    • Time since last bug found
      • Rationale - if we haven't found any bugs recently, there must be no more to find
    • Issues
      • Fast testing is not always smartest approach to testing.
      • Defect discovery does not obey a decay curve. Yes, you may have found all the obvious defects, but it doesn't mean you've found all the defects which will affect your products quality.
  3. Coverage
    • Amount of code statements covered by testing activities
      • Rationale - if we execute all the code, we're more likely to find any defects. E.g. Unit tests.
    • Number of acceptance criteria which have been verified
      • Rationale - we know how it should work. So if we test it works, we've built what we wanted.
    • Issues
      • This can lead you to 'pure verification' and not attempting to "push code over" or try unexpected values/scenarios
      • We work on an ever evolving and intertwined code base, only focussing on the new changes ignores regression testing and the fact that new functionality may break existing features.
  4. Amount of testing
    • Amount of testing done
      • Rationale - we've done lots of testing, the product must be good
    • Amount of testing not done
      • Rationale - we made the call not to test this
    • Issues
      • Doing lots of testing can be unnecessary and poor time use.Removing testing requires a judgement call on what should and shouldn't be tested.
        There's always a risk involved when you make those judgements of more or less coverage, but the perhaps the bigger 'social' risk is that you can 
        introduce a bias or blindness to yourself. If you didn't test it last time, and the product didn't fail - are you going to test it next time?Or, it could introduce a business bias - "we did lots of testing last time, and the product failed, we need to do more testing this time."
My view of : how the success of testing should be measured?
To me, we should consider the points above, but focus more on: Did the testing activities help the delivery of a successful product?
If we delivered a successful product, then surely the testing we performed was successful?
So when I answer the question of 'Did my testing activities help the delivery of a successful product', I consider:
But to make that conclusion you have to understand what factors make the product successful.
And, they may not give you the immediate or granular level of feedback you need.
e.g. if success was that the product was delivered on time, and under budget - can you tell how much your testing contributed to that time and budget saving? 
  • Was my testing 'just enough'?
    • Did I cover enough scenarios?
    • Did I deliver it fast enough?
  • Did this testing add value?
    • Have I done enough testing for the success of this change?
    • Can I get the same value by removing some testing activities?
  • Did I find defects in the product?
    • What bugs are critical to the project's success?
    • What bugs matter most to its stakeholders?
  • What does success look like for the projects stakeholders?
    • Zero bugs?
    • Fast Delivery/Turnaround?
    • Long term stability?
I haven't explicitly said that the success of testing should be measured by the quality of a product. To me it's the third bullet point "Did I find defects in the product?" - the measure of the product's quality comes when we consider those defects and the level to which the stakeholders feel they're detrimental to the products success.
I really like Michael Bolton's 2009 article Three Kinds of Measurement and Two Ways to Use Them. It got me thinking about the different ways people approach measurement and made me think about how high level I need to be when giving people a measure of success.
I guess the main thing I've learnt when talking to people, and digesting my thoughts is that you should be thinking about how you're measuring success. I don't think it's formulaic, maybe it's more heuristic, but it's always worth thinking about.


  1. Nice overview. I tend to agree. I have a recommendation however - to complete the picture you should not just consider what "help the delivery", but also things that interfere/
    hinder team and software delivery as a result:
    - bugs found late (when we could have found them earlier)
    Rationale: it is easier for developer to fix code he has just broken compared to one untouched for weeks
    - bugs reported but never fixed for whatever reason(duplicate/not a bug/won't fix/low priority...)
    Rationale: we waste our time reporting. Developers waste their time reading and analysing bug. Not to mention impact on morale/attitude.
    - bugs fixed that does not really matter to any stakeholder (but we and even they don't know about it)
    Rationale: we spent time fining a bug that would never bug anyone in production, or at least they would be perfectly fine using workarounds available


    1. Thank you Ainars, I really appreciate your feedback.

      - bugs found late (when we could have found them earlier)
      * What is late?
      If you want a success criteria of "Find bugs in time", then you have to understand what 'in time' is for the context of your change.
      Is 'in time' dependent on the developer remembering the code they changed? or is 'in time' before the project is released?
      * Why were the bugs found late? and why/how could they be found earlier?
      Testing isn't just about 'exercising code'. Are the test analysts getting involved early? are their discussions happening about requirements?
      To say that a bug was found too late, you have to understand the root cause of it. Was it bad requirements? bad understanding? bad coding?
      - Why is your code sitting around for weeks?
      * Ultimately it boils down to: When is the best time to find bugs for the success of the project?
      Probably as early as possible right?
      But, you also want the biggest bugs as early as possible? which is why you have to ask the 'What bugs are critical to the project's success' question.
      I'd suggest you could add another question under 'Did I find defects in the product?', along the lines of 'Did testing activities help to find product defects early?'

      - bugs reported but never fixed for whatever reason
      A test analyst will almost always be able to find some bug in a product if they look hard enough.
      Understanding which factors influence 'the projects success' helps to focus testing activities, and this inherently affects the defects that get raised. Defects raised should be those which will adversely affect the project's success.
      Who makes the call on what gets fixed in your project?
      Maybe instead of logging a bug, the test analyst should ask the stakeholder to review the bug they've found before it goes to the developer to fix?
      This will feed into the focusing of testing activities on finding defects which matter.
      This is where the question 'What bugs are critical to the project's success?' is powerful.

      - bugs fixed that does not really matter to any stakeholder
      If bugs are getting fixed that 'do not matter' - something is wrong with the defect prioritisation in the project.
      Just because we find a defect, doesn't mean it needs fixing.
      If it doesn't matter to any stakeholder, then the test analyst isn't asking the question: 'What bugs matter most to it's stakeholders?'

      Thank you again for your feedback. I really appreciate it.

  2. Great article! I searched in a lot of sites and finally I found something that could show me the metrics clearly. Congrats!