Introducing EQS–Early Quality Score
It would be ideal if software were developed without bugs. In reality, though, bugs are introduced every time new software is developed (at least today), hence the need to develop testing software as well.
Ultimately, our goal is not to write good tests, but rather to develop quality software products. Tests are the means to that end.
So how can we tell if our tests actually help catch or prevent bugs and whether they cover all use cases, including happy paths and edge cases? Today, these questions are answered by the “Code Coverage” criteria. In this blog post, we propose a new way to measure unit test quality: EQS.
EQS (Early Quality Score) comprises code coverage, mutation score, and a new criteria: unit tests scope coverage. Read on to see what EQS is and why it is better than using only code coverage, including examples and use cases.
Code coverage deficiencies
One of the most widely-accepted methods for measuring test quality is code coverage. Code coverage measures which percentage of the code is covered by tests. If I have zero coverage, clearly my tests either don’t exist or are useless, leaving my code with bugs and other issues.
However, code coverage is an insufficient measurement on its own. While low code coverage is indicative of poor testing, high coverage does not necessarily indicate high quality. Even with 100% coverage, the quality of tests might be low. For example, if they don’t cover enough cases of different input datasets. In this case, similarly to the low coverage example, the code still has undetected bugs.
Moreover, tests are usually developed for public methods. They are supposed to also cover private methods, since they contain the internal logic of the public ones. However, this is not always the case. High coverage of public methods doesn’t always translate to high coverage of the private methods or the rest of the code.
How code coverage impacts the mutation score
An example of the limitations of code coverage can be clearly seen through the mutation score. A mutation score is based on the idea of making small changes, or mutations, to a program's source code, and then running the existing tests to see if they can detect the changes. This helps evaluate whether the tests are able to identify defects in the code.
How mutation testing works:
- Generating mutants: The first step in mutation testing is to create multiple versions of the original program, each with a slight modification. These modified versions are known as mutants. Common types of mutations include:
- Changing a logical operator (e.g., replacing && with ||).
- Modifying a mathematical operator (e.g., replacing + with -).
- Altering a constant value.
- Changing a conditional boundary.
- Running tests on mutants: Each mutant is then tested using the existing test suite. The purpose is to check whether the tests can detect the changes (i.e., cause the tests to fail).
- Analyzing results: After running the tests, the outcomes are analyzed:
- Killed mutants: If a test fails due to the mutation, the mutant is considered "killed," indicating that the test suite is effective in detecting that type of fault.
- Survived mutants: If the tests pass despite the mutation, the mutant is considered "survived," indicating that the test suite did not detect the fault.
- There are other types of “not killed” mutants, like no coverage, timeouts, and errors (you can read more about them here).
- Calculating the mutation score: the simplified formula for mutation score calculation is:
This score provides a quantitative measure of the effectiveness of the test suite.
Since the mutation score measures the tests’ ability to identify changes in code, high code coverage should result in a high mutation score. However, based on our analysis, this is not always the case.
To assess the test quality (resulting from the coverage) for these methods, we’ll use Stryker, a mutation testing framework for JavaScript, TypeScript and others.
Note: In order to run Stryker, or any other mutation tool, you need to ensure your test suite contains only green tests.
As an example, we’ll take a file called user.service.ts, which handles different user operations. This file has six methods that handle different cases of user creation and management.
As you can see in the image above, only 2 of the 6 methods in UserService class in this file have unit tests, a total of 8 green unit tests. Coverage is 84%, which seems quite high.
Now let’s run the Stryker mutation score tool on the user.service.ts file using this command:
‘npx stryker run --mutate /path-to-project/src/user/user.service.ts’
Here is what the Stryker report looks like in the IDE:
Although the coverage for this file is 84%, we can see the total mutation score here is roughly 46%, with 13 out of the 24 mutants for this file surviving or having no coverage at all. This means that if a bug is introduced, it will not be caught by existing tests, i.e, no test will fail.
This requires rethinking the way we approach the concept of test quality.
The importance of unit testing
We propose introducing unit tests as a better way to help determine the testing quality. Unit tests test the smallest part of code that can be tested, like a functionality or a service, to validate its behavior. This means unit tests are the only tests with enough granularity to cover all test cases.
For example, if a public method has multiple code branches under it using private methods, the quality and accuracy of the test code depend heavily on testing that public method. Test cases should manage to cover the intersections of the subsequent code that the public method can invoke. This requires sufficient test cases that are difficult to cover with component or integration tests. This is especially true in edge cases, where complicated bugs thrive.
(This doesn’t mean we should rely only on unit tests, more on this - below).
How code coverage impacts the mutation score
Going back to the previous example with our user.service.ts file, let’s see what happens to the mutation score when we add unit tests:
Initial state:
Now let’s add unit tests to all the methods and see the increased impact on coverage and mutation score:
As is clearly shown, the more methods are covered by unit tests, the higher the mutation score and code coverage, indicating that unit tests increase test code quality across all criteria.
Introducing the Early Quality Score (EQS) for evaluating tests
It’s not that we propose that unit tests should replace all tests. Unit tests are an additional important tool that increases the quality of your tests (hence the quality of your code) significantly.
Therefore, based on our experience developing unit tests with AI at scale and measuring the test quality, we propose a new method for evaluating the quality of tests: EQS (Early Quality Score). EQS comprises the three main criteria we discussed:
- Code coverage - The percentage of code covered by tests
- Mutation score - Evaluating the tests’ ability to identify mutations in the code
- Unit-tests scope coverage - The percentage of public methods with unit tests that cover 100% of their respective method
Let’s dive a bit deeper into criteria #3: unit-tests scope coverage
Unit-tests coverage is a measurement that determines whether a public method is covered well by unit tests, regardless of its code coverage by other types of tests (integration, security, component, black box, etc.).
In addition, we add the criteria that these unit tests must cover 100% of the code it comes to test, hence indicating these are high quality tests.
Unit tests scope coverage combines these two elements: A public method (or function), which has unit tests AND coverage of 100% for that method.
At the Project level, the Scope Coverage is measured in a simple way. It checks the percentage of public methods that have unit tests and 100% coverage. For example, if you have 100 public methods on your project, but only 10 of them have unit tests with 100% coverage, your Unit-Tests Scope Coverage will be 10%.
Testing private methods is an interesting and different topic and will be discussed separately. It can increase test quality significantly with much less effort.
The EQS formula
EQS is formulated by multiplying the three criteria, so each element is taken into account when evaluating the overall quality of the tests for that file or project.
- Coverage score
- Mutation score
- Unit-tests scope coverage - Public methods with unit tests that cover 100% of their respective methods
Note: If you have no unit tests that cover their respective method by 100%, your EQS will be zero.
Alternatively, if your coverage is 100%, your mutation score is 100% and unit tests scope coverage for all your files, your EQS will be 100%.
One last time, let’s see the example we’ve been using with EQS:
Note: Coverage of the unit tests for each method (once created) summed up to 100% for that
method, hence that method was included in the unit-tests scope coverage for this file.
As you can see, coverage for user.service.ts was close to 100% to begin with, the Mutation score went up from 46% to 100%, and EQS went up from 13% to 100%.
Let’s see another example. If you look at this blog about benchmarking unit tests results for ts-morph across different GPT models, we had 210 testable methods, of which Early managed to generate green unit tests for 68% of these methods with coverage of 100%.
- Total coverage score was 69%.
- Total mutation for methods with 100% coverage is 90%.
- Scope coverage ratio is 68%.
The EQS for the mixture run on this example was 42%.
What’s next?
Unit tests play a significant role in determining the mutation score in software testing. Therefore, they cannot be disregarded when attempting to determine code quality. EQS (Early Quality Score) is our proposed formula for calculating the quality of tests. EQS comprises:
- Code coverage
- Mutation score
- Unit-tests scope coverage
Consequently, to achieve a high mutation score and ensure robust testing, it’s important to have a comprehensive suite of unit tests in addition to other types of tests. ChatGPT and other commercial gen AI applications can help. However, they require significant manual effort to extract working quality unit tests, hence a new generation of AI assistant and AI agent tools are evolving to take away the tedious task of developing unit tests from developers, and ultimately empower them to build higher quality software, faster.
About Early:
Try Early yourself—installation takes less than a minutes, and it’s free to start.
You code, Early will take care of the Test