Is This Code Worth Testing?

Many of us face this question on a daily basis. In many cases, the answer is yes. In certain cases, the answer is no or somewhere in between. It is important to weigh the benefits against the cost, and consider the extent to which code should be tested. A code coverage tool may report a line of code as covered, but that does not guarantee it has been thoroughly tested. Another consideration is the kinds of tests we write to begin with. For example, should a private function have tests? Or should it be covered by the tests of a public function that makes use of it? Should a function be covered by integration tests or end to end tests? We will explore these questions by examining several examples of code and determine whether and which types of tests are worth writing.

To explore whether a given piece of code is worth testing, we will look at a few examples and consider the trade-offs for writing tests in each case. We will look at the following cases:

  • A public function of a module
  • A private function of a module
  • A function that interacts with the file system
  • A function that interacts with a database
  • A function that interacts with a cloud provider
  • A function that serves as the entry point of a command line interface (CLI)

Let’s start with a public function of a module. This is typically a function you expect to be called from other modules to be composed into useful functionality. Let’s limit this case to a function with no side effects and no external dependencies like a database. An example could be a module that transforms or validates data (this could be as simple as a user record of a web application or more complicated data like summarising customer orders over the past month). This kind of function should almost certainly be covered by unit tests. It should be easy to write those unit tests because it should be clear and easy to construct the input data and to validate the output data. If it isn’t, consider refactoring the function and even the module to make it easier to interact with. In these cases, I would typically use the pytest.mark.parametrize functionality to generate an appropriate number of tests that sufficiently covers this function. Anything reported as not covered within this function, and any private functions in the module it calls, should be seriously considered for inclusion in the tests. There are some exceptions. An example could be if you are writing a match statement with a default clause that raises a NotImplementedError as a fallback and you can’t come up with any input data that will reach that line of code. It is good practice to have these default cases so that you can catch and report inputs in production that reach this case.

Since this is the first time we encountered a example of where a line of code is not worth covering, we will discuss best practices in those cases. A coding tool is most effective when its output is clear and the necessary actions are obvious. Ideally, anything reported by the tool should lead to an action. If a code base accumulates many lines of codes that are not covered for good reasons, the coverage tool will start to report a lot of lines of code as not covered. You and your team will stop paying attention to those warnings because it is hard to remember all of the lines of code that should or shouldn’t be covered by tests. This degrades the value of the coverage tool and makes it especially difficult to review the output of the coverage tool by reviewers of a PR. Coverage tools can be configured not to count certain lines of code in their reporting, presumably because there is a good reason why that line of code should not be covered. For any line of code that doesn’t need to be covered, let the coverage tool know about it so it stops bugging you to cover it with a test. In addition to configuring the tool not to report the line of code, write a comment explaining why a given line of code should not be covered by tests. That way you remember the reason when you come back a few months later and PR reviewers won’t have to ask you why a line isn’t covered. If you are reviewing a PR, pay careful attention to any lines of code that are not covered and consider whether it is appropriate. This can be a source of bugs.

Let’s return to the topic of whether lines of code should be covered by tests. Our next example is a private function in the same module as the previously discussed public function. The considerations for covering private functions are similar to public functions, including that every line should be covered unless one of the exceptions applies. The main difference for private functions is whether it should be targeted directly by a test. That is, whether there should be a test specifically for that function or whether the private function should be covered indirectly through tests of another function. There are examples where both are valid. The trade-off is how closely your tests are tied to the implementation, the complexity of tests and the number of tests.

The first consideration is how closely your tests are tied to the implementation. It is usually better to minimise testing the implementation and maximise testing of the business logic (the input and output that your users depend on). If you find yourself creating a lot of mocks and checking how they are called, you are probably writing tests that are closely tied to the implementation. Tests also benefit from surviving refactoring of the code. If you find yourself having to modify tests for trivial changes in the implementation, you probably have tests that are closely tied to the implementation. This consideration generally pushes to minimise tests of private functions.

The other trade-offs are the complexity and number of the tests. For a large module, just testing the public functions can lead to requiring many complex tests. The complexity is indicated by the number of lines in the test, the number of parameters if the test is a parametrized test and the number of test cases. If you find that you can sufficiently cover all the code in the module using a reasonable number of tests with reasonable complexity, there is no need to write tests for private functions. If you find yourself writing tests that are difficult to understand, span many lines of code with many input arguments, there could be a case for writing tests for private functions. Generally speaking, writing tests for private functions is generally less complex (because the logic of those functions is usually more focussed) and requires fewer tests. This is because complexity generally goes up non-linearly with the number of lines of code being covered, partially because of case combinations. Targeting private functions can let you target specific transformation logic. As long as the private functions being target are appropriately abstracted based on the business logic they implement, it can be safe and beneficial to write tests for those private functions. Examples of functions where it is reasonable to target private functions are:

  • A module that, for example, processes many orders has a public function that process all the orders and a private function that processes individual orders. Targeting this private function is relatively safe because you are testing the business logic of how to handle an individual order which should persist through refactoring.
  • A private function that processes input and transforms it to data that is easier to handle for subsequent functions. This example is less clear cut because it is testing implementation to a degree. However, because it is closely related to input and representing that input, it should be relatively stable. If the input changes, likely the business logic has changed as well so it is reasonable to expect to need to modify the tests.
  • A function that processes a subset of the business logic. For example, applying a discount to an order. This kind of function is reasonable to target because you are testing the business logic of applying a discount to an order which should survive refactoring.

For private and public functions in a module, we have mainly discussed whether to cover them with unit tests so far. Whilst it isn’t generally necessary to specifically target these kind of functions with integration or e2e tests, it would be good to have the functions be exercised by at least one of those kind of tests to ensure external interactions work as expected. These kind of tests don’t need to be extensive since most of the logic should be covered by the unit tests. The next type of function we’ll discuss will require tests that approach integration testing.

Functions that interact with the file system are generally a little simpler than other functions that have external dependencies. That is because there are many tools for making these interactions manageable, the file systems being typically fast and commonly available where tests are run. There is usually no need to mock the file system. A best practice is to use a base directory as input, even if it’s the current working directory, so a temporary directory can be passed during tests and automatically cleaned up. Python’s tempfile module and pytest’s tmp_path fixture provide helpful tools for creating and cleaning up temporary paths. Tests that interact with a file system usually have more complex setup and teardown because of the need to create files and directories and write content to them. The walrus operator can be used to create a directory and write contents to a file in one line whilst also getting a handle to the file or directory:

def test_path(temp_path):
    (dir := temp_path / "dir").mkdir()
    (file := dir / "file.text").write_content("content 1", encoding="utf-8")

To minimize exposure to the file system, write simple wrapper functions that extract data from files into records or take records as input and write them to the file system. This approach limits testing to checking the wrappers and then you can otherwise interact with the convenient records which can easily be generated in parametrized tests.

Now that we have mentioned convenient records for the first time, let’s think about how to generate test data. To simplify test data generation, use typing.NamedTuples for complex data that can’t be expressed with basic types such as an integers or strings. By following this practice, you will find yourself needing to generated these records in tests, both as input and to check outputs. The factory_boy package makes it easy to generate them. This blog post discusses a neat trick to Reduce Duplication in Pytest Parametrised Tests using the Walrus Operator. Functions interacting with file systems should be tested with unit tests and at least one integration or e2e test to cover their external interactions. This increases the complexity of integration and e2e tests because they now also have to interact with the file system. Integration and e2e tests are more complex for this reason which is why it is best to keep the number of these tests to a minimum.

The next type of functions we’ll examine are are those that interact with databases. SQLite is a tool that enables you to cover these kind of functions in tests. Similar to file system functions, it is good to minimise the direct interactions with the database. These functions should also convert the database records to more convenient forms, such as named tuples, so that the fact the records came from the database is transparent to as much of the system as possible. This will make tests easier to write because you can easily generate these records whereas writing to and cleaning up the database is more complex. The factory_boy package also supports a few commonly used ORMs so it will be useful for these tests as well. Functions that interact with the database should be covered by integration or e2e tests, especially the wrapper functions around the database. Extensive integration tests for those wrapper functions guarantee those interfaces work.

So far we haven’t encountered a case where mocking is recommended. The next kind of function, for which we will use cloud provider interactions as an example, have external dependencies which are difficult to spin up during tests. These are most commonly web APIs potentially wrapped by a Python package. These kind of dependencies should almost always be wrapped by a purpose built, simple interface that the rest of your code base interacts with and should have limited unit tests. Tests that check how code interacted with an external dependency are just re-writing the interface code in your tests and add little value. It can be difficult to make an error case come about repeatedly in a test without mocks. This is where you can use mocks to raise those errors which will enable you to test your logic handling those errors. The interface wrapping the external dependency should be extensively covered by integration tests. Outside of testing those interfaces, a mock can be used for any functions that interact with that interface. The following blog post covers this in greater detail: Navigating Mocks in Python: Strategies for Safe and Efficient Mock Usage.

The last kind of function we will consider are your interfaces that will be used by your users. We will use a CLI target function as the example, other examples are the code you have written to accept web requests and generate responses, the main function for a custom GitHub action and so on. It’s usually not beneficial to write unit or integration tests for these functions as simulating their real-world environment is challenging. Most issues with these functions will stem from input and output, rather than logic errors. If you find logic errors in these functions, you probably have business logic embedded in these functions. Functions that deal with the outside world should just focus on those interactions and use a simple interface to a public function in a module that contains all of your business logic. The most crucial tests for these functions are end-to-end (e2e) tests, which should closely imitate the production environment as much as possible. For instance, run the real CLI instead of importing and calling the main function in tests. This will help you discover problems with how you are taking input and returning output on the command line. These tests should cover typical user scenarios. If you are providing a service, it can be worth it to run these e2e tests against the production service periodically to confirm that your services are up and running, similar to health checks.

In summary, we have looked at a few example functions and considered whether and to what degree they should be covered by unit, integration and e2e tests. We found that public and private functions on a module with no external dependencies should generally be covered by unit tests. It is usually not necessary to target private functions with tests directly, although it can be useful to reduce the number of tests or their complexity. If a private function is targeted by a test, it is safest if that function handles business logic and the tests check how that business logic is handled. This ensures that you can refactor your code without needing to change tests. Next, we considered functions with external dependencies. In the case of file system and database dependencies, we found that it was generally not necessary to use mocks. Instead, writing wrapper functions that abstract the specifics and targeting these wrapper functions with unit tests is sufficient. We also discussed interactions with external dependencies that are difficult to create locally, such as interactions with a cloud provider. In these cases, we discussed that it is best to write a simple interface for these interactions, cover them with integration tests and then mock the interface for all other tests except e2e tests. Finally, we discussed functions that are interfaces with the outside world, such as the main function for a CLI. For these kind of functions, needing unit and integration tests is usually a sign that they implement business logic rather than only handling the interactions with the outside world. For those interactions, e2e tests which cover common use cases of the system should be sufficient to ensure the interface will work for your users.

Leave a comment