There are a few cases where code contains human readable messages. Examples are log entries and exception messages. These kinds of messages typically are sentences like “there was a problem communicating with the server, please try again later.” It is usually important to check that log messages are written and exceptions are raised in tests. One of the approaches for these tests is to check that the exact message was written or raised. These kind of messages are subject to change to make them clearer to users which makes tests that check for the exact content of the messages prone to fail whenever the wording is adjusted. In this post we will explore some alternative approaches to writing these kind of tests that are less prone to false failures.
You might not be convinced that checking for an exact message in a test is a problem yet, so let’s see why it might be. In tests, usually it is important to check that the execution followed a particular path. For example, that data was successfully sent to a server. In the case of a log messages, it is important to check that a specific log message was written, the content isn’t as important. For example, the following code conditionally writes two different log messages:
def write_log(success: bool):
if success:
logging.info("data sent to server")
else:
logging.info("could not send data to server")
A test should check that the success messages is logged if success
is True
and that the failure message is logged otherwise. It doesn’t matter to the test how the success or failure messages are worded. For example, the tests should still pass if the messages were more specific, e.g., by replacing data
with order
:
def write_log(success: bool):
if success:
logging.info("order sent to server")
else:
logging.info("could not send order to server")
If tests depends on the specific wording of a message, the message is essentially written in two different locations (once in the source code and the second time in the tests). A counter argument could be that writing them in two different location essentially acts as a way of double checking that the correct message was written. There are better ways of achieving that intent, for example, by careful review of a pull request or through user testing.
Now that we have established that checking the exact message content is a problem, we will explore some alternatives. The first alternative, which requires no change to the messages in the code, is to check for key words in the message rather than the entire content. For example using the pytest caplog fixture:
def test_write_log_success(caplog):
caplog.set_level(logging.INFO)
write_log(success=True)
assert "sent" in caplog.text
def test_write_log_fail(caplog):
caplog.set_level(logging.INFO)
write_log(success=False)
assert "not send" in caplog.text
This reduces the coupling between the message content and the tests that check which message was logged. I find it helpful to define a helper function that makes it easy to check that multiple substrings are present in a longer string to be able to use multiple words to identify a given message:
def assert_substrings_in_string(substrings, string):
for substring in substrings:
assert substring in string, f"{substring!r} not in {string!r}"
It is usually helpful to list the words in the test so that it is obvious which message is being targeted. For example:
def test_write_log_success_unclear(caplog):
caplog.set_level(logging.INFO)
write_log(success=True)
assert_substrings_in_string(("server", "sent"), caplog.text)
def test_write_log_success_clearer(caplog):
caplog.set_level(logging.INFO)
write_log(success=True)
assert_substrings_in_string(("sent", "to", "server"), caplog.text)
The first example makes it sound like the server sent something, which is not the case. The second example makes it clear that something was sent to the server, which is accurate. Checking just that the substrings are in the message irrespective of the order means that any message that includes those words will pass. One drawback is that, without careful thought of which words to check for, it is possible to write a test that looks like it is checking for the right message but would pass for multiple messages. For example, just looking for server
in the messages would get either message to pass. It is also more difficult to check that a substring appears a specific number of times. For both those cases, perhaps the below alternative is more appropriate.
Another alternative is to include a unique identifier in the message. The simplest form is to number each log message uniquely, a more human friendly form would be to give each message a name:
def write_log(success: bool):
if success:
logging.info("success: order sent to server")
else:
logging.info("failure: could not send order to server")
This means that the tests can just use the name:
def test_write_log_success(caplog):
caplog.set_level(logging.INFO)
write_log(success=True)
assert "success:" in caplog.text
def test_write_log_fail(caplog):
caplog.set_level(logging.INFO)
write_log(success=False)
assert "failure:" in caplog.text
This also makes it easier to figure out what part of the code wrote the log message if you are debugging a problem. There are alternatives for the debugging use case, e.g., including the filename and line number in the log message formatter. Names are usually easier to interpret and remember than line numbers so it can be quite helpful to include both.
In summary, checking for the entire message in a test leads to tighter coupling between the test and the wording of a message and results in the log message being defined in two places. This duplication can be reduced by checking for unique words in a log message or giving each log message a name and checking for that name in a test.