My take on software testing

My view on the practice of software testing, shaped by years of wrestling writing, maintaining, and extending test suites

May 28, 2024

white text on purple background — Photo by Ferenc Almasi on Unsplash

Writing tests is one of those activities that virtually every software engineer recognizes as crucial to the software development process. Yet, many engineers still regard it as an afterthought, something they do to get their work approved rather than an integral part of their workflow.

This often results in poorly written tests that are difficult to read, maintain, and extend.

I firmly believe that testing represents a core aspect of a software engineer's job, and as such, it deserves the appropriate level of attention and care. The topic is vast, with countless books dedicated to it, so in this article, I'll focus on my personal experience as a software engineer and share the learnings I've accumulated over the years.

Why is testing important?

The answer might seem obvious: we write tests to ensure our software is doing what it is supposed to!

However, no matter how many tests you write, you can't definitively prove your code is correct. This is especially true because real-world software specifications often contain ambiguities.

Therefore, my personal answer to the question is that we write tests to boost our confidence that the software behaves according to our current understanding of the specification.

There are two key points here:

Increase confidence: We can never definitively prove the software behaves exactly as expected. However, by writing enough targeted tests, we gain enough confidence to release it to production while still being able to sleep at night.
Current understanding: Software specifications and our understanding of them inevitably change over time. When these changes occur, our test suite helps identify the ramifications and ensures other parts of the system still function as expected.

Prioritizing tests: balancing confidence and efficiency

The testing investment depends on the software's criticality. Life-saving systems like airplane controls require far more rigorous testing than a startup's MVP.

To achieve the highest level of confidence, ideally you would test your entire software system, including the underlying infrastructure, while focusing on simulating real user interactions. This means verifying that the system behaves as expected by end users by testing complete user journeys that span across multiple, independent parts of your software. However, achieving this comprehensive level of testing comes at a significant cost. These in-depth tests require substantial time, resources, and complexity to execute and coordinate.

A more practical strategy is to exercise your system at different levels, striking a balance between confidence and cost. The literature often cites the Test Pyramid as a general guideline:

Unit Tests: many
exercise a single unit in isolation, employing test doubles in place of real dependencies. These tests provide confidence in the correctness of the business logic.
Integration Tests: some
verify one or more units by exercising their interaction points with a real dependency. These tests provide confidence in the interactions between the unit and the dependency.
System Tests: few
verify the entire application by exercising entire end-to-end user journeys. These tests provide confidence in the whole application.

I personally don't like this categorization for a few reasons:

The common definition of a unit is rather subjective but generally consists of a function or a class. To me, this seems overly restrictive.
Integration tests are similarly fuzzy: is testing the interaction between 2 different classes an integration test or do we need to have a network boundary to make it one?
Most importantly, why do we care?

In the end, our goal is to strike a balance between confidence and cost. So, why not divide our tests into two simple categories:

Fast: many
These are tests that our CI will run on each commit. They need to be fast to keep running costs low and provide a reasonably fast feedback loop during development. These tests can target any unit of your software, from functions to modules to entire microservices, and can use both test doubles and real dependencies (e.g., database, cache, etc). The only requirement is speed. A good target is to keep your test suite runtime below or around 10 minutes. This gives you enough time for a quick coffee break and to have the tests done by the time you return.
Slow: some
These tests may require complex setups (e.g., staging deployment) and might not be possible to run concurrently. They won't run automatically on every commit, but maybe they'll need to be manually triggered before being allowed to merge to main, or run by the CI as part of the release pipeline. These tests should focus mainly on end-to-end user journeys, exercising multiple features of your application to validate their interaction and the assumptions you've encoded in the test doubles you've employed in the fast tests.

Structuring test

Let's get down to the nitty-gritty of writing tests! While the specifics can vary depending on the language and framework, I find myself using a similar pattern regardless: Arrange, Act, Assert.

This pattern involves three phases:

Arrange: Prepare the testing environment. This includes tasks like initializing the database, setting up test doubles, and creating the target of the test.
Act: Trigger the behavior you want to test. This could involve calling a function, sending messages to a queue, or adding data to a database.
Assert: Verify the outcome of the behavior. This might involve checking the response, the state of the database, or message acknowledgments in the queue.

This pattern applies to various test types, from simple unit tests to complex end-to-end tests, and provides a good foundation for writing structured and maintainable test suites.

In general, the Arrange phase is the most challenging and time-consuming. It often involves generating specific data and wiring dependencies to create the unit under test. For complex systems, this phase can become a significant barrier that discourages developers from writing new tests. We can address this by creating helper libraries that simplify data preparation and dependency management in a declarative way.

A common trap is making test code overly generic. I believe test code should prioritize clarity over code reuse. The goal is to have each test be independent, clearly written, and require minimal boilerplate code. Some repetition between tests is acceptable if it improves understanding and maintainability. For instance, having a big dataset shared across many tests is much less clear than setting up each single test with only the pertinent data needed to exercise the behavior.

Methodical tests definition

At some point you’ll find yourself having written a bunch of test and wondering if they are enough or should you add some more.

This often indicates a lack of a structured approach to test definition, leading to tests written based on intuition. Without a strategy, different developers might write tests covering entirely different parts of the system. The test suite's quality would then depend heavily on the developer's experience and domain knowledge.

We can reduce this variability by employing a strategy called Specification Testing.

This strategy can be summarized in the following points:

Identify Inputs: Identify the inputs your target unit receives (e.g., function parameters).
Equivalence Classes: For each input, partition the values into equivalence classes, that is groups of values expected to yield the same result (e.g. even and odd integers for an isEven function).
Positive and Negative Tests: For each equivalence class, write a positive test (using a value inside the group) and a negative test (using a value outside the group).
Boundary Values: For each equivalence class, write tests covering the boundary elements (e.g., test with 9, 10 and 11 for the group of integers less or equal to 10).

With a growing number of inputs and their equivalence classes, testing all possible combinations might become impractical. In such cases, leverage your domain knowledge to prioritize tests that are more likely to uncover issues or pose a higher risk. Ultimately, the number of combinations you test depends on the desired confidence level and amount of resources you’re willing to invest.

Let the code guide you

While Specification Testing provides a structured approach, as we discussed, it can be impractical to test all possible input combinations, especially with a growing number of inputs. Entire equivalence classes might be missed. Special input combinations might be undervalued and left out. This could result in untested sections of code.

To identify these gaps, we can use code coverage tools. These tools analyze test execution and pinpoint portions of code that remain inactive. By examining these uncovered code sections, we can identify missed equivalence classes or specific input combinations that warrant dedicated testing.

It's important to remember that code coverage is a tool to supplement Specification Testing, not a standalone metric. A low code coverage score might be acceptable for a low-risk module. Conversely, achieving 100% coverage can be misleading. While all the code may have been exercised, the tests might have only covered a limited portion of the possible inputs.

Test-First vs. Code-First

No discussion on testing is complete without addressing the test-first vs. code-first debate. There are valid arguments for both approaches.

While I strongly favor a code-first approach in general, I acknowledge that Test-Driven Development (TDD) can be valuable in certain situations. Here are some examples:

Limited Experience with Testable Design: TDD enforces good coding practices that lead to more testable code. This can be particularly helpful for developers who are new to writing testable software.
Bug Reproduction: Writing a failing test to reproduce a bug can pinpoint the problem's root cause and provide immediate feedback on the effectiveness of the fix.

Ultimately, the best approach depends on the project and the team's experience.

Conclusion

The world of software testing offers a variety of approaches, each with its own merits. In this article, I've shared some lessons learned throughout my experience in professional software development:

🏎️ Write many fast tests to enable rapid feedback during development.
🐌 Write some slow tests to gain confidence in the overall system.
⚙️ Use the arrange-act-assert pattern to structure your tests.
🪛 Invest effort in making tests easy to write, reducing disincentives for developers to add new ones.
📕 Be methodical in test definition using specification testing techniques.
🔬 Use code coverage to complement your specification testing efforts.

I hope these insights prove valuable for your software testing journey!

📚 Resources

Here are some resources that shaped my view on testing, and which are much more authoritative than my ramblings:

Effective Software Testing by Maurício Aniche
The Practical Test Pyramid by Ham Vocke
Continuous Delivery by Jez Humble & David Farley
Prefer Fakes Over Mocks by Tyrrrz

🤙 that’s it for this one

That is it from me! If you enjoyed (…or hated) this article let me know in the comments, and feel free to connect and reach out on LinkedIn.

Until the next one… allons-y!

Skill Up Software Engineering

Discussion about this post