Quality Confidence: a lead measure for software quality
For a long time I’ve wanted to be able to express the quality of my current software release in a simple intuitive way. I don’t want a page full of graphs and charts I just want a simple visualisation that works at every level of requirements to verification (up and down a decomposition/recomposition stack if you’ve got such a beasty). My answer to that is Quality Confidence.
What it is
QC combines a number of standard questions together in a single simple measure of the current quality of the release, so instead of going to each team/test manager/project manager and asking them the same questions and trying to balance the answers in my head I can get a simple measure that I can use to quantitively determine whether my teams are meeting our required “level of done“. The QC measure combines:
- how much test coverage have we got?
- what’s the current pass rate like?
- how stable are the test results?
We can represent QC as a single value or show it changing over time as shown below.
How to calculate it
Quality Confidence is 100% if all of the in scope requirements have got test coverage and all of those tests have passed for the last few test runs.
To calculate QC we track the requirements in a project and when they’re planned for/delivered. This is to limit the QC to only take into account delivered requirements. There’s no point in saying we’ve only got 10% quality of the current release because it only passes some of the tests because the rest having been delivered yet.
We also track all of the tests related to each requirement, and their results for each test run. We need to assert when a requirement has “enough coverage” so we know whether to include it or not – the reason for this is that if I say a requirement has been delivered but doesn’t yet have enough test coverage then even if all of it’s testing has passed and been stable then I don’t want it adding to the 100% of potential quality confidence. The assertion that coverage isn’t enough means that we aren’t confident in the quality of that requirement.
So 100% quality for a single requirement that’s in scope for the current release is when all the tests for that requirement have been run and passed (not just this time but for the last few runs) and that the requirement has enough coverage. For multiple requirements we simply average (or maybe weighted average) the results across the requirements set.
If we don’t run all the tests each during each test run then we can interpolate the quality for each requirement but I suggest decreasing the confidence for a requirement (maybe by 0.8) for each missing run. After all just because a test passed previously doesn’t mean it’s going to still pass now. We also decrease the influence of each test run on the QC of a requirement based on it’s age so that if 5 tests ago the test failed it has less impact on the QC that the most recent test run. Past 5 or so (depending on test cycle frequency) test runs we decrease the influence to zero. More info on calculation here.
So… how much coverage is enough?
Enough coverage for a requirement is an interesting question… For some it’s when they’ve covered enough lines of code, for others the cyclomatic complexity has an impact, or the number of paths through the requirements/scenarios/stories/use cases etc. For me, a requirement has enough test coverage when we feel we’ve covered the quality risks. I focus my automated testing on basic and normal flows and my human testing on the fringe cases. Either way, you need to make the decision of when enough is enough for your project.
To help calibrate this you can correlate the QC with the lag measure of escaped defects.
Measurement driven behaviour
The QC measure is quite intuitive and tends to be congruent with people’s gut feel of how the project/release is going, especially when shown over time, however there’s simply no substitute for going and talking to people. QC is a useful indicator but not something that you can use in favour of real communication and interaction.
The measurement driven behaviour for QC is interesting as you can only calculate it if you track requirements (of some form) related to tests and their results. You can push it up by adding more tests and testing more often 🙂 Or by asserting you have enough coverage when you don’t 😦 However correlation to escaped defects would highlight that false assertion.
If you’ve got a requirements stack ranging from some form of high level requirements with acceptance tests to low level stories and system tests you can implement the QC measure at each level and even use it as a quality gate prior to allowing entry to a team of teams release train.
Unfortunately, because me and my mate Ray came up with this in a pub there aren’t any tools (other than an Excel spreadsheet) that automatically calculate QC yet. If you’re a tool writer and would like then please do, I’ll send you the maths!