Code Coverage Metrics: A Useful Tool with Limitations
Discover the benefits and limitations of code coverage as a helpful tool when practicing TDD, and learn how to effectively use it while avoiding common pitfalls.
Disclaimer: You are reading an early version of the text! The final version of the book will be revised and may contain additional content.
Previously we discussed effectively combining different kinds of tests to create a well-rounded testing strategy. But now you might wonder: How can we measure the success of our testing efforts? Can we define metrics and set goals to do so?
In the past, I’ve mistakenly used code coverage metrics as a target to work towards. But this turned out to lead to all kinds of problems. So let’s look closer at code coverage, its pros and cons, and why code coverage tools still have a place in shaping our testing strategy.
Code Coverage Basics
Code coverage tools measure how much of our code is covered by tests, allowing us to define thresholds that break the build if coverage falls below a certain point. Evaluating our testing strategy by setting coverage thresholds is tempting, but we must be cautious when defining them.
Code coverage tools like c8 and Istanbul determine what code was and was not executed during a test run, considering unexecuted code untested. However, they don’t know whether specific parts of the code are relevant or prone to errors. And even more importantly: Those tools can’t determine if our tests test anything meaningful.
When a Measure Becomes a Target
A high code coverage in itself doesn’t provide any value to anybody. We want to write tests to, among other things, establish fast feedback loops, enable us to do continuous refactoring, and prevent us from introducing regressions while doing so. To rake in all those benefits, we need high code coverage. Yet, unfortunately: code coverage is a metric we can game easily.
// Just rendering a simple component can lead to 100%
// code coverage without testing anything meaningful.
it('should render correctly', () => {
const wrapper = mount(Component);
expect(wrapper.html()).toContain('div');
});
When a measure becomes a target, it ceases to be a good measure because people start to game it.
– Goodhart’s law
Trying to achieve or maintain high code coverage via thresholds often results in canny developers finding ways to hit the required code coverage percentages without actually testing anything meaningful. Instead of focusing on reaching the threshold, the real goal should be that our tests give us confidence and help us do our job more efficiently. The primary objective of testing is not to reach an arbitrary threshold but to provide fast feedback during development to aid our development flow, prevent bugs, and ensure the maintainability of our software in the long run.
Using Coverage as a Tool, Not a Target
Despite the limitations of concrete code coverage targets, code coverage tools can still be instrumental in aiding our testing efforts. For example, while 100% code coverage is not a desirable goal in most cases, a low value suggests that our application’s code coverage is insufficient. On the other hand, high code coverage doesn’t guarantee that our application is well-tested. Yet a low code coverage indicates that we must improve how we write code.
While 100% code coverage is not a desirable goal in most cases, a low value suggests that our application’s code coverage is insufficient.
It’s essential to use coverage metrics as a tool rather than as a target. We should monitor our coverage metrics but not get hung up on reaching specific thresholds. Yet, we must be vigilant and critically question steadily decreasing code coverage. Tools can show us exactly which lines of code are not covered by tests, allowing us to decide on a case-by-case basis whether we’re missing a test for an important feature related to the uncovered lines of code. If it is regularly the case that we find essential lines of code not covered by tests, we need to up our testing game.
There are three possible situations we can find teams in when looking at their code coverage:
Inadequate code coverage: We cannot rely on our tests to find regressions in untested parts of our application. Changes to the code are risky, and we can’t rely on our tests.
Artificial high code coverage: When we write tests primarily to meet arbitrary thresholds, we often write tests tied to implementation details. Tieing tests to implementation details makes future code changes riskier since any code change also requires modifying our tests.
Naturally high code coverage: Tests that originate as an artifact of a TDD process and adhere to certain best practices entail naturally high code coverage. They give us confidence that our changes will not cause unforeseen regressions. Therefore, refactoring is risk-free because we can rely on the tests.
Code Coverage Comes Natural When Practicing TDD
When practicing TDD, high code coverage comes naturally. Conversely, building an application with low code coverage is impossible if we consistently practice TDD and write tests first. For this and many other reasons, TDD is the preferred way to build software. However, we often find ourselves in situations where large parts of the application already exist, and we only start writing tests after the fact. In these cases, it’s particularly tempting to introduce arbitrary thresholds to increase coverage. But yet again, we should not set random targets but use code coverage metrics to visualize how coverage increases the longer we work on the previously untested codebase.
To establish TDD as the standard for adding code to the codebase, we must meet the following basic requirements:
Seamless process integration: Testing is not an afterthought, but we consider it in every process step.
Know-how: All team members are familiar with TDD and do not fear applying it.
Testing is painless: The test setup is stable, and adding new tests is easy and fast.
In the upcoming chapters, we will explore how to set up a test environment that makes it easy for us to work according to the TDD principle and how to integrate automated testing into the product development process seamlessly.
What would you consider the lower-bar when it comes to TDD-based code coverage as per your average "target"? My quick research indicates at minimum 60-70%, but based on certain provisos/circumstances:
(https://softwareengineering.stackexchange.com/questions/1380/how-much-code-coverage-is-enough?newreg=9c47c195716c454f8ca80fef586dd260)
https://testing.googleblog.com/2020/08/code-coverage-best-practices.html